2007 STS Conference Poster Session


ACRL Science and Technology Section

Issues and Trends in Digital Repositories of Non-Textual Information:
support for research and teaching

Monday, June 25, 2007

Reception and Poster Session: 11:00 - 12:00

Following the STS Program, the Program and Research Committees conducted the STS Reception and Poster Session. The Research Committee  invited ten groups of authors to present posters on topics of interest to Science and Technology Librarians.



DSpace as a Repository for Map and GIS resources: A Case Study and Overview of Issues
Kathy Weimer, Texas A&M University Libraries

Map resources are increasingly digitized and made available through the internet.   One method for long term preservation, an institutional repository (IR), shows signs of rapid growth, particularly for map resources.  IRs provide benefits over both a basic web presentation and a digitization clearinghouse/listing because the IR is often searchable through Google Scholar, allowing for searching across repositories instead of just within them.  Also, many are configured as an  OAI-PMH data provider, resulting in freely harvested metadata.

In 2004, the Texas A & M University Libraries deployed  DSpace as its IR, and named the local instance TxSpace ( http://txspace.tamu.edu).  A pilot project is underway to study scientific map and GIS resources in an IR, specifically, 1) the use of geographic coordinates in metadata in building a map-based search interface, and 2) the addition of GIS files in an IR environment.  The Libraries have digitized and uploaded the complete set of the Geologic Atlas of the United States folios to DSpace.  The 227 folio set was published by the USGS between 1894 and 1945, and contains text, photographs, maps and illustrations.  The set is an example of a multi-format work whose access and storage is now available through an institutional repository.  In order to optimize the geographic nature of the set, coordinates were input to the metadata, and a YahooMap! interface was created.  Additionally, the digitized maps are currently being converted into GIS files, and will be added to the collection.

Poster coming soon.


Librarians and the Data Challenge: Exploratory Work of the MIT Libraries Data Initiatives Group
Anna Gold, Massachusetts Institute of Technology

For librarians whose professional experience with scientific research outputs has centered on the textual products of research work, it is no small challenge to understand how data science and data management fit within, alongside, or outside text-oriented constructs and systems for science communication. One challenge is to shift perspective from the research publishing cycle to the research cycle. That broadened perspective suggests that new relationships will develop between data and the textual descriptions of research. Another challenge is that the well-understood infrastructure for managing textual records of research (e.g. selection, metadata, preservation) doesn’t exist yet for much scientific data.  At MIT a group of five science and engineering librarians began to meet in 2006 to learn more about these issues. The Data Initiatives Group began by studying major reports on digital data archiving, cyberinfrastructure, and e-science. Members shared findings from relevant conferences, met with local experts, and co-sponsored speakers on data and the semantic web. In order to understand local digital research data management practices and their relevance to MIT’s Institutional Repository,  DSpace at MIT, group members worked to deposit a relational database into DSpace at MIT; and conducted interviews with faculty regarding their data management needs. Future work will include creating a science and engineering data services web site, and a proposed job description for a science data librarian.

Poster (pdf)


Spatial SQL - Querying Spatial Objects Based on Geographic Information: An Open Source Approach
Chieko Maene, Northwestern University

Many items in libraries are geographic objects, or objects with spatial information such as place names or street addresses. Examples include historic photographs, aerial photographs, topographic and fire insurance maps. However, the ways in which most libraries locate information are limited to analog tools, such as catalog records, index cards and index maps. Wouldn’t it be nice if we could find these objects based on spatial information? The popularity of mapping sites such as Google Maps indicates that Internet users are comfortable with finding geographic objects by simply typing a street address. Libraries can implement the same idea.

Our project http://maps.lib.uic.edu/ (search for Northeastern IL Airphoto/Topo maps) introduces the concept of “spatial queries” using the open source software packages  PostGIS  and  Geocoder.US. Combining these tools permits users to find photographs and maps by querying an object-relational database with spatial information such as a street address or intersection. The query returns objects that have a proximity or spatial relationship to the input information. The result is a “spatially enabled” SQL data server which supports such queries as, “find and show me photographs near this location” or “find and show me maps that contain this location.”  This poster includes  a live demonstration of the database, and discusses low-cost alternatives to proprietary spatial enabled SQL data servers, for non-profit organizations such as libraries.

Poster (pdf)


ComPADRE Digital Library: Repository of Non-Textual Educational Resources in Physics and Astronomy
Elizabeth Bolton, Cecilia Brown, Bruce Mason, University of Oklahoma

ComPADRE, Communities for Physics and Astronomy Digital Resources in Education, is a unique, web-based digital library offering resources that science and technology librarians can use to meet the information needs of a variety of physics and astronomy teachers and learners. Funded by the National Science Foundation’s  National Science Digital Library project, ComPADRE is a collaborative effort between several organizations including: the  American Association of Physics Teachers, the  American Astronomical Society, and the  American Institute of Physics, and physics and library science researchers and students. Together we have created a well-organized repository of outstanding physics and astronomy educational materials encompassing an array of non-textual objects including animations, simulations, images, teaching applets, and raw data. Web analytics data, user surveys, focus groups, and usability studies indicate the ease of use, relevance, and exemplary quality of ComPADRE’s resources. For example, since August 2005, a link to the “Virtual Physics Laboratory,” a website presenting physics simulations, was the search result most often chosen by visitors. Additionally, usability studies revealed that 80% of high school teachers found an animation they would use in future classroom instruction. This poster will describe ComPADRE’s non-textual resources in conjunction with use and usability data. The ultimate goal is to demonstrate to science and technology librarians and information professionals the usefulness and relevance of ComPADRE’s resources, which can be freely and easily employed to enhance and augment the physics and astronomy information products and services they provide to meet the information needs of their users.

Poster coming soon.


Preserving Soil Survey Data with GIS
Christopher Miller, Purdue University

Purdue University Libraries are looking at a number of ways to store and improve access to non-textual data. One project example is the digitization of the 1906 Tippecanoe County soil survey.
Soil surveys document soil types and locations via prose and maps, typically published as a single document. Modern soil surveys are born digital and used largely in electronic contexts (including GIS), but there are decades of rich comparative data being left to atrophy in the undigitized copies of aging paper surveys. This project digitizes both components of the 1906 survey – the text and map – and extracts both into usable, modern data formats.

Librarians met with a group of soil scientists to help define the parameters of the project and to garner insight into the ways soil survey data are, or could be, engaged. Informed by these conversations, librarians decided that in addition to OCRd, fully-indexed, fully-searchable full text, the soil zone data from the map would also be extracted into a usable GIS dataset that could be studied and queried. Search and query will be available for each component of the original document and each will be able to link and query the other, allowing the user to jump seamlessly between text and map based on common semantic elements.

This poster highlights the design process, presents a draft of the final product, and discusses the use of GIS as a novel way to present data for research and teaching.

Poster (pdf)


Using an Institutional Repository for Prostate Tissue Microarray Image Data Preservation
William Simpson, University of Delaware

The  University of Delaware Library Institutional Repository, launched in April 2005, provides access to original research by faculty and staff. Publicly available, collections in the repository include both textual and non-textual content.

In 2004, scientists at the UD Department of Biological Sciences performed the largest single study yet undertaken to examine MUC1 (a protein produced by the epithelial cells of the gastrointestinal, respiratory, and urogenital tracts) expression in human prostate cancer.  Results of the study were stored locally in a laboratory database, and have been published in the  public domain. Image data from the local results database were transferred to the repository, making it easier to preserve, track, query and share data with other scientists.  Prostate tissue images are from prostatectomy tissue specimens, stripped of patient identifiers, obtained from the Winship Cancer Institute, Emory University.  The images of tissue arrays, taken with a digital camera affixed to a fluorescence microscope and stored locally, are typically TIFF files and can be large.  All 137 images in the repository are labeled based on a naming convention identifying the tissue microarray block (numbers) and the location of the “spot” (or TIFF image) on the array slide.  The results of further image analysis and data tables from this study will eventually be transferred to the repository.

Poster (pdf)


Capturing Visual Scientific Assets: The GSFC Library's IMAGES Database
Kathleen McGlaughlin and Mitzi Cole, Goddard Library, NASA Goddard Space Flight Center

For years now, Goddard Librarians have handled requests from NASA employees as well as the general public for Goddard scientific images and movies. Often these images were difficult and time-consuming to track down because they originate from disparate offices and individuals at the Center. Quite often sources of Goddard images would be lost as websites changed, links died or staff migrated.

To further access and preservation of Goddard’s scientific work, we created the IMAGES database ( http://library.gsfc.nasa.gov/search/img/) to serve as a single repository of images and animations derived from Goddard scientific data. It is intended to be a permanent archive for the images and their associated metadata, and supports the knowledge management initiative at GSFC.

The IMAGES database was developed at the Goddard Space Flight Center Library in 2002 with funding from a grant from NASA’s STI (Scientific and Technical Information) Program. It was built in-house with an SQL Server, using Cold Fusion and an XML schema that makes these images and metadata easily transferable to the  NASA Image Exchange.

Poster (pdf)


A Watershed Moment: Preserving & Improving Access to Water Quality Data
Marianne Bracke and Michael Witt, Purdue University

Librarians at Purdue University are partnering with scientists to help them describe, preserve, manage, and share the data generated by their research. Scientists are interested in publishing their data to meet the requirements of funding agencies as well as to enable their datasets to be more broadly discovered and used. The  Distributed Data Curation Center (D2C2) at the Purdue University Libraries has developed a data repository framework to house datasets and furnish services and tools to make this possible. This poster describes an example of such a partnership between librarians and agronomists to create a data collection of water quality samples gathered at Purdue’s Agronomy Center for Research and Education (ACRE). It includes an analysis of the researchers’ workflow and the automation of the description and ingestion of instrument data into a repository using XSLT and programming scripts. A scan of available thesauri and community formats and practices was conducted before creating our methodology and publishing our own descriptive schema. The project has two phases: the first to ingest and archive five years’ worth of past data as a batch process, and the second to integrate our tools into the data collection process so that current and future data flows into the repository. Metadata from the water quality sample data collection is harvested, aggregated with metadata from other repository collections, indexed for searching, and presented on the web in a context with other digital library content such as e-prints and digitized archival collections.

Poster coming soon.


An Educational Program on Data Curation
Melissa H. Cragin, P. Bryan Heidorn, Carole L. Palmer, and Linda C. Smith, University of Illinois at Urbana-Champaign

Several models of service are emerging in academic and research libraries for the collection and management of scientific data – an important segment of an institution’s total scholarly production. As libraries work to develop services to support the management of locally-generated data, they will require new kinds of expertise for providing appraisal, management, and access to data for long term use. Data curation is the active and ongoing management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time. However, recent reports on e-research, cyberinfrastructure, and the stewardship of digital assets acknowledge a significant deficit in the workforce that will be required to manage these increasing data stores. To address this growing need, the Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign has initiated a new concentration within our ALA-accredited Master of Science degree. The  Data Curation Education Program (DCEP) offers a focus on data collection and management, knowledge representation, digital preservation and archiving, data standards, and policy. In this poster we will introduce key elements of our Data Curation Education Program and some implications for preparing librarians and information specialists to carry out data curation activities in libraries. Based on the work of our Advisory Committee, we will also present emergent best practices for preparing LIS students to work on digital data curation problems in academic and research libraries.

Poster (pdf) -  Abstract (pdf)


Selecting a Digital Asset Management Program for a Medical School Department
Margaret Henderson, Virginia Commonwealth University

The Department of Anatomy and Neurobiology at the Medical College of Virginia, Virginia Commonwealth University, received a grant for the implementation of a Digital Asset Management (DAM) program to provide a centralized, searchable database for all research and teaching data (images, spreadsheets, reports, presentations, proprietary data, etc.), generated by department members and the core equipment facilities in the department. As well as research, the department also teaches histology and anatomy courses.

Phase 1 - Needs Assessment and Information Audit:  Presentations were made to the Core Facilities Directors and the whole department to explain the project.  Surveys were given out to collect information. Interviews were conducted with all Core Directors to compile a list of equipment, software and file types used in the department. Interviews were conducted with PIs/faculty to establish what types of data they to collect, and how they want to be able to search and use the data. IT services was also contacted to establish current system configurations and protocols, and to set up IT help with implementation and use of the new database.

Phase 2 - Select Potential Software Companies:  Ten scientific digital asset management programs were researched.  Two companies gave presentations on their products and the final choice was based on how well the system fit with the needs of the Core Directors and PIs, the cost, and the ease of interface with existing systems and equipment.  A report listing equipment and software, and comparing the programs was prepared. The Department purchased  IQbase from MediaCybernetics.  The installation started in January 2007.

Poster (pdf)


