Metadata Systems, 2000–2001

It is 2002, and by now everyone has read a dozen articles that begin with some variation on the exposition “What is metadata? Basically, it is data about data ...,” but Carl Lagoze has asserted that he has yet to “read a paper that evaluates what metadata is good for” (Graham 2001, 292). Rebecca Graham's article “Metadata Harvesting” may offer one approach to answering that question. She describes the Open Archives Initiative (OAI), a project to illuminate the dark Web by making the contents of databases accessible to Web search engines, and posits that the project has the “potential to provide information about the usefulness of metadata” (290). Her questions will have to be answered through further research:

  • What is the value of Dublin Core (DC) in resource discovery?
  • What are other values of metadata such as those focused on preservation at Cornell?
  • What are the benefits to parallel metadata sets?
  • What is the best approach to mapping between DC and these parallel sets?
  • What are the benefits and costs of the automated use of metadata?
  • What are the OAI's impacts on inter library loan (ILL)?
  • How does this use of metadata impinge upon intellectual property issues?
  • How might the OAI influence the development of repositories? (295)

In her article “A Quantitative Categorical Analysis of Metadata Elements in Image-Applicable Metadata Schemas,” Jane Greenberg (2001) compares the VRA Core, Dublin Core, RLIN REACH, and EAD with regard to their support of image information. She lists the elements of each standard and identifies which use of metadata each supports: discovery, use, authentication, or administration. In addition, she compares the goals of the standards, the schema granularity (application level) and the limitations of each. Citing the “artificial boundaries” created between domains by differing metadata standards, she asks how her results might “contribute to the design of a superior image oriented metadata schema” providing crossdomain image access (921).

The Journal of Internet Cataloging 4, no. 1–2, is a special issue entitled “CORC: New Tools and Possibilities for Cooperative Electronic Resource Description.” While the issue contains many articles with helpful descriptions of how libraries are using CORC and Dublin Core, I have selected a few with research implications. In his evaluation of how well Dublin Core elements handle information about serials, “Dublin Core and Serials,” Wayne Jones poses the larger question, worth considering in the context of the Dublin Core, but also in the context of other metadata standards: “What aspects of a serial have to be recorded in order for the resulting record to be considered complete and useful?” (144)

In the same issue, David Yehling Allen’s case study, “Using the Dublin Core with CORC to Catalog Digital Images of Maps,” contends that map librarians must come to some agreement about the use of the Dublin Core for map cataloging and looks forward to the Map and Geography Round Table (MAGERT) task force conclusions on this issue (166). His assertion that the simplicity of the Dublin Core makes it a good choice for map cataloging because of the complexity of map cataloging and the dearth of professional map catalogers (175) deserves empirical study.

The Journal of Digital Information (JODI) Volume 2 of is a special issue on metadata from the 2001 Dublin Core conference in Tokyo, and contains many good articles about applications of DC. Again, I have chosen a few here that seem to have research implications. John A. Kunze, in “A Metadata Kernel for Electronic Permanence,” describes the ERC standard, a persistent identifier of electronic materials using a reformulation of the Dublin Core into simple kernels. ERC operates on some interesting premises that would be worthy of further research, namely:

  • That metadata should be readable and comprehendible by humans in real language.
  • That easy-to-use, easy-to-read, and easy-to-machine–parse metadata will increase the percentage of objects with digital permanence.
  • That a resource provider's express agreement to maintain a resource's accessibility is valuable in increasing digital permanence and should be coded in the resource's metadata.

In the same issue of JODI, a team-authored article entitled “Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization” describes a study conducted to answer the question of whether authors are good candidates to provide metadata for their own work. Specifically, the authors set out to determine:

  • Can authors create acceptable Dublin Core quality metadata?
  • What perceptions do authors have about metadata in general and metadata generation activities?
  • What Web form features can assist with author-generated metadata?

Though the article's authors found that the participating authors for the most part produced acceptable metadata, they died point out that their experiment was confined to a particular organizational setting, and bears repeating in other settings and that their results cannot be confirmed without testing the metadata in an experiment that measures user satisfaction.

The Library of Congress Bicentennial Conference on Bibliographic Control for the New Millennium held in November 2000 gave rise to a number of papers about metadata, many of which consist mainly of general background information. Though all of the authors do tend to ask more questions than they answer, I have selected a few that seem to have particular research implications. In Regina Romano Reynolds's article, “Partnerships to Mine Unexploited Sources of Metadata,” the author throws out possibilities for new sources for seed metadata, given the inability of metadata specialists to keep up with the tide of new electronic resources. These include Committee on Institutional Cooperation, the U.S. Copyright Office records, International Standard Serial Number (ISSN) registrations, and many others. Though Reynolds begins to do some mapping of elements of these metadata standards to the Dublin Core, the actual research into these partnerships needs to be done.

Sally McCallum (2000), in her paper “Extending MARC for Bibliographic Control in the Web Environment: Challenges and Alternatives” asks many questions regarding the applicability of the MARC metadata format to electronic resources, including:

  • Are there “complexities in the current content of the [MARC] bibliographic record for which the time may be appropriate to consider whether they are necessary in today's environment?”
  • “Do ‘title pages’ or their analogs in electronic documents have enough stability to make transcription as useful as it is for print or object oriented publications?”
  • “Are special normalized forms of some data still as critical or is research producing information identification and searching tools that require less rigor since the whole document content may theoretically be searched?”
  • “Are display, retrieval, and sorting requirements different for Web resources, indicating less need for specificity?”

Ultimately, McCallum asks, “Should we re-evaluate MARC in light of electronic resources?”

As is often the case with things invented to make life easier (think of the computer), metadata can sometimes become quite complex instead. In his “Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description?” Carl Lagoze (2001) asks the question that has divided the Dublin Core community since the beginning and comes down on the side of the minimalists or simplifiers. Noting that elaborate extensions to local applications of DC limit the interoperability of the product created with other systems, Lagoze argues that “we should stick to a qualification regime that is easily deployable and generalizable and resist the impulse to introduce greater complexity until its principles are understood and the tools to deploy it are stable.” However, actual empirical data evaluating the minimalist vs. complex structuralist approach is called for.

In the maze of metadata standards, practitioners may find useful the work being done on application profiles by Baker et al. (2001), who attempt to itemize applications of various metadata standards and their divergences from the standard, as well as to explore links among various standards, in order to facilitate sharing and crosswalking. In another article, “Application Profiles: Mixing and Matching Metadata Schemas,” some of the same authors ask:

  • How do we deal with conformance, or lack thereof, to published metadata standards?
  • Can elements from different metadata standards effectively be combined?

Application profiles themselves also promise to be a good resource for research into metadata systems.

Works Cited

Allen, David Yehling. 2001. Using the Dublin Core with CORC to catalog digital images of maps. Journal of Internet Cataloging 4, no. 1–2: 163–77.

Baker, Thomas, et al. 2001. What terms does your metadata use? Application profiles as machine-understandable narratives. Journal of Digital Information 2. Accessed Jan. 17, 2002

Graham, Rebecca A. 2001. Metadata harvesting. Library Hi Tech 19, no. 3: 290–95.

Greenberg, Jane. 2001. A quantitative categorical analysis of metadata elements in image-applicable metadata schemas. Journal of the American Society for Information Science and Technology 52, no. 11: 917–24.

Greenberg, Jane, et al. 2001. Author-generated Dublin Core metadata for Web resources: A baseline study in an organization. Journal of Digital Information 2. Accessed Jan. 17, 2002

Heery, Rachel, and Manula Patel. 2000. Application profiles: Mixing and matching metadata schemas. Ariadne 25. Accessed Jan. 17, 2002 /issue25/app-profiles/intro.html.

Jones, Wayne. 2001. Dublin Core and Serials. Journal of Internet Cataloging 4, no. 1–2: 143–48.

Kunze, John A. 2001. Metadata kernel for electronic permanence. Journal of Digital Information 2. Accessed jan. 17, 2002

Lagoze, Carl. 2001. Keeping Dublin Core simple: Cross-comain discovery or resource description? D-Lib Magazine 7, no. 1. Accessed Jan. 17, 2002

McCallum, Sally. 2000. Extending MARC for bibliographic control in the Web environment: Challenges and alternatives. Library of Congress Bicentennial Conference on Bibliographic Control for the New Millennium. Accessed Jan. 17, 2002

Reynolds, Regina Romano. 2000. Partnerships to mine unexploited sources of metadata. Library of Congress Bicentennial Conference on Bibliographic Control for the New Millennium. Accessed Jan. 17, 2002

Prepared by Patricia M. Dragon, Special Projects and Collections, Monograph Cataloging Division, University of Michigan Library,