Data Management and Sharing
By following good data management practices early on in your research, such as documenting your data as you go, it will be easier for you to publish and share your research. You will need to determine which of your data, code, and documentation to share and prepare it for consumption outside of your immediate research team.
Ethics and Compliance: You need to be aware of restrictions and rules for data sharing
De-identifying human participants data: Please see the section for a list of trainings and articles on removing direct and indirect identifiers from research data.
Selecting data for deposit: Share enough data, code, and documentation to make your research reproducible.
Ethics and Compliance
Check with your divisional IRB office if you are unsure what you can share and review applicable government policies and guidance on protecting PHI.
Divisional JHU IRBs
- Johns Hopkins Medicine Institutional Review Board
- Homewood IRB: for Krieger School of Arts and Sciences, Whiting School of Engineering, School of Education, Carey Business School, Nitze School of Advanced International Studies, and Peabody Institute.
- School of Public Health IRB
Policies on Human Participants and Data Sharing
HIPAA for Professionals by US Health and Human Services: U.S. Department of Health and Human Services provides information regarding patient privacy, de-identification methods, security, etc., for people who work with data containing Protected Health Information/Personal Identifiable Information. If you plan to work with PHI/PII data, their Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule defines 18 personal identifiers and acceptable methods to de-identify patient information.
FERPA policies resource site from Dept. of Education regarding sharing and accessing student data for educational research
CITI Training on Human Subjects Research: Core training in human subjects research and includes the historical development of human subject protections, ethical issues, and current regulatory and guidance information.
- HHS Policy for the Protection of Human Subjects ('Common Rule'): This policy regulates the protection of human subjects in research by providing a robust set of protections for research subjects.
JHM Data Trust: The Johns Hopkins Medicine Data Trust Council has been established to provide JHM researchers with the technical infrastructure, standards, policies and procedures, and organization needed to bring together patient and member-related data from across the health system to support our mission. The goals of the Data Trust are to: Ensure security and privacy of our patients’ data, consolidate teams to address organizational priorities and reduce redundancy, and increase the value of data through better integration and analytics.
- Accessing data from JH Medicine clinical enterprise systems such as EPIC or PMAP requires approval from the Data Trust Reseach Subcouncil, following their procedures for requesting data.
- Sharing JH Medicine Data (patient- or member-related data stored in clinical enterprise systems) with researchers outside of JHM (including other JHU divisions such as School of Public Health, and Krieger Arts & Sciences), or data repositories such as Johns Hopkins Research Data Repository, may require Data Trust review, including IRB-approved protocols for data sharing and data de-identification if needed. More information.
De-identifying Human Subjects Data
With researchers increasingly encouraged or required to share their data, preparing to share datasets with confidential identifiers of people and organizations is particularly challenging.
JHU Data Services Resources
Protecting Human Subject Identifiers Guide: A very comprehensive guide that will introduce you to concepts and basic techniques for disclosure analysis and protection of personal and health identifiers in research data for public or restricted access, following applicable JHU data governance policies.
Webinars: Go to our calendar to find the next live webinar about of common privacy disclosure risks from personal and health identifiers in data and techniques for de-identifying data for external collaborators and public databases. We also discuss preparing consent forms that facilitate data sharing, and keeping identifier data secure during and after projects.
Interactive, online training: JHU Data Services has developed an online training to be taken at your convenience. It provides an overview of the types of identifiers, and how to determine if your data have disclosure risk. You will also learn about available JHU resources to help you with de-identifying data.
Applications to Assist in De-identification of Human Subjects Research Data: A list of de-identification software tools and applications that researchers can use in de-identifying their research data for more public sharing.
NIH: Protecting Privacy When Sharing Human Research Participant Data: This supplemental information was created to assisting researchers in addressing privacy considerations when sharing human research participant data. It provides a set of principles, best practices, and points to consider for creating a robust framework for protecting the privacy of research participants when sharing data.
NIST de-identification tools: National Institute of Standards and Technology has compiled a list of de-identification tools and also descriptions of each of the tools.
Cancer Image Archive: https://wiki.cancerimagingarchive.net/display/Public/Submission+and+De-identification+Overview
Selecting Data for Sharing and Preserving
In general, you should share enough data, code, and documentation so that others can reproduce your work. Also, your funder likely also has a definition of what is considered data and may provide guidance on what to share as well (see funder requirements).
Guidance on Selecting data for sharing
- How to Appraise and Select Research Data for Curation: From the Data Curation Center, criteria you can use for assessing what is impactful to share.
- Selecting data for publication: From the CESSDA "Data Management Expert Guide", questions to ask yourself about your data when determining what to share.
The FAIR Guiding Principles for scientific data management and stewardship, published in 2016, outlined methods for broadening access to shared data, focusing particularly on better discovery and open access through data repositories, and better reuse through documentation and machine-readable metadata standards. FAIR Principles fit within the wider promotion of Open Science and reproducible research. Data sharing policies by funders often cite these principles as a goal for making publicly funded data more widely available.
- FAIR Principles: overview provided by the GO FAIR Initiative
- FAIRsharing.org: provides resources and database collections supporting FAIR principles for various stakeholders including:
- FAIR Sharing Standards: A registry of terminology artefacts, models/formats, reporting guidelines, and identifier schemas.
- FAIR Data Repositories & Knowledgebases: A registry of knowledgebases and repositories of data and other digital assets
- FAIRsharing.org Data Policies database: A registry of data preservation, management and sharing policies from international funding agencies, regulators, journals, and other organisations.
- CARE Principles for Indigenous Data Governance: discussing special considerations for sharing data from indigenous populations
- FASEB Science Policy and Advocacy: Federation of American Societies for Experimental Biology's collection of policy statements and best practices regarding data management and sharing, including the DataWorks! initiative promoting data sharing and exemplary data management plans.