LITA logo.
""Library & Information Technology Association

“The Future Is Now: Global Authority Control”

ACIG Program and Business Meeting—25th anniversary celebration

ALA Annual Meeting, Chicago, Illinois, July 13, 2009, 1:30—5:30

 

PROGRAM

 

Mary Mastraccio, ACIG chair, welcomed the audience.  In the interest of time, she did not make extensive introductions of speakers.

 

Tim Spalding (founder, LibraryThing)

Authority Control 2.0?

 

Spalding briefly described LibraryThing, software that allows individuals to catalog their personal libraries, but with features that encourage social networking and cooperative cataloging, as well as idiosyncratic projects, such as cataloging the libraries of dead people (e.g. Thomas Jefferson, Marilyn Monroe).  A major feature of LibraryThing is the use of “tags,” or user-formulated- and supplied subject terms.  These are unstructured, non-hierarchical terms, and Spalding sees that as a strength.  53 million have been added; LibraryThing software aggregates them.  LibraryThing is not without authority control; the two major means are disambiguation and Common Knowledge.  Users participate in disambiguation through “combine” and “split” functions.  This can be done with authors, with works (Spalding seemed to regard a textual work with different titles in the same language and a textual work translated into different languages as equally eligible for combination), or with tags.  “Authorized forms” are established by usage.  One rule is that combined tags have to be alike in both meaning and usage.  All the changes made are logged, and can be reversed. Common Knowledge is a set of data stored in a fielded wiki; users can and do use it to record things that neither publishers nor catalogers would routinely provide (character names in a novel, related places and events).  1.5 million edits occur in Common Knowledge each year.  Authority control follows the Wikipedia model; though not enforced by software, rules are applied by “heads” (knowledgeable individuals in a field).  Spalding acknowledged shortcomings—data in LibraryThing is not always systematic or perfect; it is not immune to the “curse of either/or thinking;” and there is no good way to merge authority files.  He finished by admonishing librarians to see LibraryThing users as valuable allies—they care about the same things that we do.

 

Jeanne Spala (Civica/CMI—ILS vendor)

Global Authorities in the Local Catalog—(presentation on their ILS product)

 

Spala described authority control components in her company’s ILS, Spydus.  The catalog affords extensive use of external links to enriched content and Web 2.0 features (data from Google Books, Facebook, LibraryThing, etc.).  There is robust global-change software that allows for very granular changes.  It is designed for search and display of data in any language (the current list of supported languages is short).  The database supports multiple versions of MARC bibliographic records, though authority records are maintained in MARC21.

 

Michael Kreyche (Kent State U. systems librarian)

Spanish Equivalents for LCSH

 

Kreyche has developed a bilingual database of Spanish equivalents for LCSH (available at http://lcsh-es.org ).  While a login is required for searching, use of the database is free.  His purpose was to bring together as many of the disparate lists—some electronic, some not—that have been independently developed.  With the increase in Spanish-speaking library users in the U.S., the need is increasing; and Kreyche hopes to develop a global collaborative for further development and maintenance.  He began in 2005 with headings in bibliographic records from the San Francisco Public Library OPAC, then added terms from the Queens (N.Y.) Public Library, which then incorporated the SFPL headings in its own catalog.  Other lists added include CSIC (a Spanish list on CD-ROM), MARC authority records from the National Library of Spain, Bilindex from 1984 (the English-Spanish index in the back), and Simon Spero’s LCSH (otherwise known as the Fred 2.0 Project.  In 2007 Kreyche procured NEH funding, which he used to convert his data to MARC, and to hire student programmers, and launch the Connexion Web service which inserts Spanish-language headings into bib records.  He finished with another plea for collaborators.

 

Karen Smith-Yoshimura (OCLC)

Cooperative Identities Hub

 

Names as identifiers are ambiguous.  They depend on context and can change over time.  Current authority practice aims to create unique headings, usually by adding dates—often insufficient and not always helpful.  Ambiguity of names—LCNAF uses dates (insufficient for most)—names depend on context, they change over time.  We have lots of sources for disambiguation, but they are widely dispersed.  Within the RLG Division of OCLC, Networking Names Advisory Groups was formed to answer the question “In an ideal world, what information would we want for tracking names?”  The group created user scenarios for librarians, students, archivists, institutional repositiories, and others.  The group developed a prototype Cooperative “Identities Hub”—a framework in which to concatenate/merge authoritative information on creators, using social networking, to provide a gateway to all forms of a creator’s name without preferring any single form.  The objectives were 1) to bring together information about creators from disparate sources and expose a wider community (not just librarians) to the data; 2) to increase efficiency of metadata creation; 3) make it easier to identify creators and their works regardless of language or discipline; 4) determine a preferred form within a given context; and 5) generally expand knowledge of people and corporate bodies gained beyond the library catalog.  The hub data elements include at least one form of name; life events (e.g. origin, places of output, affiliations); associated entities; some works (though not necessarily all); a short biography; and unique identifiers.  At OCLC, the most visible outcomes of this initiative have been WorldCat Identities and OCLC’s participation in the Virtual International Authority File (VIAF).   Among pending developments in the former is a Merge Identities function, open to anyone.  While the function will operate in the WorldCat.org realm, corresponding changes to underlying Connexion data will be made as well.

 

Thom Hickey (OCLC)

The Virtual International Authority File

 

The Virtual International Authority File in its current form began in 2003 as a joint venture of the Library of Congress and the Deutsche Nationalbibliothek.  The Bibliothèque nationale de France joined about a year later; a large number of libraries began contributing files in 2008.   A salient principle of the VIAF is that rather than attempting to identify a single authorized form of a name, national/regional variations are allowed to co-exist.  Its current scope is limited to personal names and geographic headings, but eventually all sorts of named entities will be included (but no subject headings).  The software mines data from bibliographic records associated with the entity and merges that information with existing authority records.  The database contains 10.4 million name records that form 8.7 million clusters, to each of which is assigned an identifier.  “Match points” include titles of works, various sorts of dates, joint authors, LCCN’s, either fully or in part.  99.5% accuracy in matching is the goal.   Next steps are to continue adding participants, expand the scope from current personal/geographic limits, and get data and files from more sources (e.g. rights agencies, ISNI (International Standard Name Identifier), regional and specialized files).  Benefits for OCLC include improved FRBR matching of English- to non-English records, aid in authority control, and provide better regional tailoring of WorldCat.org.

 

Janis Young (LC)

Authorities and Vocabularies, LC’s New SKOS-Based Service

 

Authorities and Vocabularies came from a desire and a demand that LCSH and allied data be accessible on the Internet for free.  The vehicle of choice was SKOS (Simple Knowledge Organization System), based on the Resource Description Framework.  The current address of the database is http://id.loc.gov.  The database provides human and programmatic access to the LCSH vocabularies, including subject, genre/form, children’s, subdivision, and validation records).  There are also links from LCSH to RAMEAU, the authority file of the Bibliothèque nationale de France.  A primary benefit is the capacity to download the entire file of a controlled vocabulary for free, though a search interface can locate individual records.  Some of the capabilities and limitations can be identified by reading information at the Technical Center link.  Coming soon is the Thesaurus of Graphic Materials (TGM) and the MARC code lists for languages, geography, and relator terms.

 

Diane Hillman (Information Institute of Syracuse)

Registering the RDA Vocabularies

 

Diane asked “Why register vocabularies?”  Her answers were that users benefited from 1) enhanced visibility (especially outside the library community); 2) learning of changes; and 3) having broader participation.  Vocabulary owners receive 1) help in managing “versioning” and changes; 2) notifying users and maintainers; and 3) gaining support for vocabulary extension and “community formation.”   To make the RDA data elements and vocabulary lists more useful to the broader audience to which they are aimed, registration was desirable.  As a result of meetings in May 2007 between members of the Joint Steering Committee for the Development of RDA (JSC), the Dublin Core Metadata Initiative (DCMI), and other Semantic Web groups, the DCMI/RDA Task Group was formed.  It was charged to build a formal representation of RDA elements and vocabularies and to set up an Application Profile, with the end of including the data elements and vocabularies in the National Digital Science Library Metadata Registry (NDSL Registry).

 

Among the challenges the Task Group faced were 1) timing, with the text of RDA still in flux’; 2) technology, and 3) standards and conventions, which were not settled for RDA vocabularies.  The Task Group discovered that most data elements and vocabulary   concepts had changed at least once, sometimes drastically, and that the Group was at the end of the line to see revised texts.  Registration of the FRBR entities is intended, but has been delayed by problems with the IFLA (International Federation of Library Associations) Web site.

 

Other challenges: 1) including relationships to FRBR entities as part of the element definitions creates limitations for use of RDA outside library community; 2) the number of techniques for relating RDA elements and FRBR entities add to the complication of the schema; 3) decisions made about how to preserve traditional library data linkages limit usefulness for other communities.

 

This summary does not do justice to the detail and breadth of Diane’s presentation.  More information can be found at the NDSL Registry site (http://metadataregistry.org/ ), and visitors can “play” in the “Sandbox” to see other schemas and perhaps create their own.  While the process has been difficult, doing the work in parallel with RDA development makes implementation possible more quickly.  This work will also serve as a basis for migration from MARC to broader-use platforms.

 

ACIG 25th anniversary

 

Attendees were invited to join in eating cake in celebration of ACIG’s 25th anniversary, and in honor of the group’s “founding mother,” Barbara Tillett (Library of Congress).  Barbara reminisced briefly about the group’s genesis and its accomplishments.  Early meetings showed an overwhelming interest in issues related to authority control, and this interest persists.  The downside is that the issues themselves have persisted; Barbara expressed hope that today’s talks were emblematic of change to come.

Barbara Tillett recognition and her comments