Keeping Up With… Research Data Management
This edition of Keeping Up With... was written by Cathryn F. Miller, Rebekah S. Miller, and Gesina A. Phillips.
Cathryn F. Miller is a Visiting Social Sciences Librarian at Duquesne University, email: millerc12@duq.edu; Rebekah S. Miller is a STEM Librarian at Duquesne University, email: miller75@duq.edu; and Gesina A. Phillips is a Digital Scholarship Librarian at Duquesne University, email: phillipsg@duq.edu
What is Research Data Management?
Research Data Management (RDM) is a broad concept that includes processes undertaken to create organized, documented, accessible, and reusable quality research data. [1] The role of the librarian is to support researchers through the research data lifecycle.
Figure 1: USGS Data Lifecycle Model [2]
The processes involved in RDM are more complex than simply backing up data on a thumb drive and ensuring that sensitive data is kept secure. Managing data includes using file naming conventions, organizing files, creating metadata, controlling access to data, backing up data, citing data, and more. There are checklists online which point to the considerations and processes in RDM (see UK Data Services Checklist [3] and DCC checklist [4]).
RDM is a relevant topic to keep up with at a time when researchers are increasingly required to create data management plans, provide methodological transparency, and share data. [5]
Making the Case for RDM
While many researchers are interested in data management, there are some who may not see a need for it at all. If that’s the case, there are both carrots and sticks that can be used to encourage them.
Carrots
- Save Time: Properly managing data is in a researcher’s best interest; being able locate past or current data and accompanying metadata saves time, frustration, and money.
- Increase Citations: Well-managed data is easy to share, which can lead to the data itself being cited; making the data available may also lead to more citations for the original paper. [6]
- Enhance Reproducibility: Data management enhances reproducibility by making the methodology more transparent.
- Preserve Data: While data management encourages researchers to consider backup and security measures, it also ensures that data is preserved, not just stored. Preservation focuses on the long-term ability to access and use data, and considers interoperability and open file formats.
Sticks
- Required Sharing: Funders and journals have begun to require data sharing as a requirement of publication or award acceptance; Nature, PLoS One, and the American Journal of Political Science all have data sharing requirements.
- Required Data Management Plans: A variety of government funders (e.g., NSF, NIH) and private funders (e.g., Bill & Melinda Gates Foundation [7]) require data management plans and/or data sharing.
- Prevent Retraction: Accessible data protects from retraction; the New England Journal of Medicine retracted an article after the underlying data could not be located, [8] as has Cell Cycle. [9]
The Role of the Librarian
Librarians need to provide RDM services that take into account the “interests and needs” of the university community that includes graduate students, faculty, and research staff [10] Understanding the current practices, knowledge, and desired support at a university is therefore key in developing and maintaining relevant services.
- Services and Education: How are librarians serving researchers? Common RDM services include helping researchers to deposit data in institutional and disciplinary repositories, assisting with data management plans, and consulting with research teams. [11]Librarians also serve the research community by creating workshops, webinars, and tutorials. Providing RDM services does not require librarians to become experts at statistics, programming or IRB proposals, but instead to develop a robust understanding of tools and support mechanisms available on campus. This may prompt collaboration with campus computing or statistical support services.
- Information: Providing background information as well as advanced information about RDM through LibGuides, newsletters, or webpages is also important. When communicating information, it is important to limit jargon and use the terminology that researchers themselves are familiar with. More than 50% of the academic libraries that have an RDM presence online provide information related to creating data management plans, data documentation, metadata standards, storage and preservation. [12]
Understanding the “needs and interests” of researchers can guide the development of services, information objects, and instruction. Communicating the role of the librarian, marketing services, and evaluating services are key to staying relevant.
Learning about RDM
Because RDM can be a complex process with many different considerations, learning about RDM through a series of modules is recommended. For those familiar with basic RDM concepts, reading research journals and engaging in online communities is key.
Training modules (see https://nnlm.gov/data/courses-and-workshops for a complete list):
- ESIP Federation: 35 training videos about very specific topics from “Tracking Data Usage” to “Handling Sensitive Data” (http://commons.esipfed.org/datamanagementshortcourse)
- DataOne Data Management Modules: 10 powerpoints accompanied by handouts and hands-on exercises (https://www.dataone.org/education)
- NYU RDM Training for Information Professionals: 8 tutorials about RDM in a biomedical context (https://compass.iime.cloud//mix/G3X5E)
Research:
- Journal of eScience Librarianship
- International Journal of Digital Curation
Online communities/websites (see Barbrow, Brush, and Goldman [13] for a complete list):
- NNLM RD3 (https://nnlm.gov/data)
- Digital Curation Centre (http://www.dcc.ac.uk)
Conclusion
What specific processes should researchers engage in throughout the data lifecycle? The answer to this question varies by discipline, by research project, by size of the data collected, and by researcher; the RDM practices involved in an ethnographic study will be very different from those involved in clinical research. RDM is complex, ambiguous, and imperfect because of the complexity of research itself. Supporting research throughout the data lifecycle by consulting with researchers and promoting best practices can be challenging, but will improve data quality, reproducibility, and shareability.
Additional Resources/ Tools
Creating a Data Management Plan: DMPTool (dmptool.org), DMPonline (dmponline.dcc.ac.uk)
Workflow and organization: REDCap (project-redcap.org), Open Science Framework (https://osf.io)
Sharing/publishing: Registry of Research Data Repositories (re3data.org), figshare (figshare.com)
Examples of library RDM presence:
- University of Minnesota (https://www.lib.umn.edu/datamanagement)
- Duquesne University (http://guides.library.duq.edu/datamanagement)
- Columbia University (https://scholcomm.columbia.edu/data-management)
ACRL Workshop: RDM (https://acrl.libguides.com/scholcomm/toolkit/RDMWorkshop)
Evaluation tool for RDM tutorials or workshops: DataOne EEVA (https://www.dataone.org/education-evaluation)
Notes
[1] Louise Corti, Veerle Van den Eynden, Libby Bishop, and Matthew Woollard, Managing and Sharing Research Data: A Guide to Good Practice (Los Angeles: Sage, 2014), 2.
[2] “The USGS Data Lifecycle,” US Geological Survey, accessed April 9, 2018, https://www2.usgs.gov/datamanagement/why.php.
[3] “Data Management Checklist,” UK Data Services, accessed on April 8, 2018, https://www.ukdataservice.ac.uk/manage-data/plan/checklist.
[4] “Checklist for a Data Management Plan v.4.0,” Digital Curation Centre, last modified 2014, http://www.dcc.ac.uk/resources/data-management-plans/checklist
[5] Lisa Federer, “Research Data Management in the Age of Big Data: Roles and Opportunities for Librarians,” Information Services & Use 36 (2016): 35.
[6] Heather A. Piwowar and Todd J. Vision, “Data Reuse and the Open Data Citation Advantage,” PeerJ 1 (October 2013): e175, https://dx.doi.org/10.7717/peerj.175.
[7] “Bill & Melinda Gates Foundation Open Access Policy,” Bill & Melinda Gates Foundation, accessed April 9, 2018, https://www.gatesfoundation.org/how-we-work/general-information/open-access-policy.
[8] “Retraction: CPAP for the Metabolic Syndrome in Patients with Obstructive Sleep Apnea. N Engl J Med 2011;365:2277-86,” New England Journal of Medicine 369 (2013): 1770, http://www.nejm.org/doi/10.1056/NEJMc1313105.
[9] “Editorial retraction,” Cell Cycle 16, no. 3 (2017): 296, https://doi.org/10.1080/15384101.2016.1205369.
[10] Travis Weller and Amalia Monroe-Gulick, "Differences in the Data Practices, Challenges, and Future Needs of Graduate Students and Faculty Members," Journal of eScience Librarianship 4 (2015): e1070, http://dx.doi.org/10.7191/jeslib.2015.1070.
[11] Ayoung Yoon and Teresa Schultz, "Research Data Management Services in Academic Libraries in the Us: A Content Analysis of Libraries’ Websites," College & Research Libraries 78, no. 7 (2017): 925, https://doi.org/10.5860/crl.78.7.920.
[12] Ibid., 926-927.
[13] Sarah Barbrow, Denise Brush, and Julie Goldman, “Research Data Management and Services: Resources for Novice Data Librarians,” C&RL News 78, no. 5 (May 2017), https://doi.org/10.5860/crln.78.5.274.