Data Management and Sharing

This guide gathers overviews and resources for data management and sharing following the research workflow for data, from preparing data management and sharing plans for grant proposals, conducting research, to sharing research data.

By following good data management practices early on in your research, such as documenting your data as you go, it will be easier for you to publish and share your research. You will need to determine which of your data, code, and documentation to share and prepare it for consumption outside of your immediate research team. 

Ethics and Compliance: You need to be aware of restrictions and rules for data sharing

De-identifying human participants data: Please see the section for a list of trainings and articles on removing direct and indirect identifiers from research data.

Selecting data for deposit: Share enough data, code, and documentation to make your research reproducible.

Document Data: Please see the page called "document data" to find information on and resources you can use to both organize and document your research for sharing. 

Ethics and Compliance

Check with your divisional IRB office if you are unsure what you can share and review applicable government policies and guidance on protecting PHI.

Divisional JHU IRBs
Policies on Human Participants and Data Sharing

De-identifying Human Subjects Data

With researchers increasingly encouraged or required to share their data, preparing to share datasets with confidential identifiers of people and organizations is particularly challenging.

JHU Data Services Resources

Protecting Human Subject Identifiers Guide: A very comprehensive guide that will introduce you to concepts and basic techniques for disclosure analysis and protection of personal and health identifiers in research data for public or restricted access, following applicable JHU data governance policies.

Webinars: Go to our calendar to find the next live webinar about of common privacy disclosure risks from personal and health identifiers in data and techniques for de-identifying data for external collaborators and public databases. We also discuss preparing consent forms that facilitate data sharing, and keeping identifier data secure during and after projects.

Interactive, online training: JHU Data Services has developed an online training to be taken at your convenience. It provides an overview of the types of identifiers, and how to determine if your data have disclosure risk. You will also learn about available JHU resources to help you with de-identifying data. 

Applications to Assist in De-identification of Human Subjects Research DataA list of de-identification software tools and applications that researchers can use in de-identifying their research data for more public sharing.

Additional Resources

NIH: Protecting Privacy When Sharing Human Research Participant Data: This supplemental information was created to assisting researchers in addressing privacy considerations when sharing human research participant data. It provides a set of principles, best practices, and points to consider for creating a robust framework for protecting the privacy of research participants when sharing data.

NIST de-identification tools: National Institute of Standards and Technology has compiled a list of de-identification tools and also descriptions of each of the tools.

Cancer Image Archive: https://wiki.cancerimagingarchive.net/display/Public/Submission+and+De-identification+Overview

National Library of Medicine Scrubber: a freely available clinical text deidentification tool designed and developed at the National Library of Medicine.  Watch this presentation to learn more. 

Selecting Data for Sharing and Preserving

In general, you should share enough data, code, and documentation so that others can reproduce your work. Also, your funder likely also has a definition of what is considered data and may provide guidance on what to share as well (see funder requirements).

Guidance on Selecting data for sharing

FAIR Principles

The FAIR Guiding Principles for scientific data management and stewardship, published in 2016, outlined methods for broadening access to shared data, focusing particularly on better discovery and open access through data repositories, and better reuse through documentation and machine-readable metadata standards. FAIR Principles fit within the wider promotion of Open Science and reproducible research. Data sharing policies by funders often cite these principles as a goal for making publicly funded data more widely available. 

See also, JHU Data Services Online training on Open Science and the Open Access Guide by the Sheridan Libraries