Protecting Identifiers in Human Subjects Data

Introduction to concepts and basic techniques for disclosure analysis and protection of personal and health identifiers in research data for public or restricted access, following applicable JHU data governance policies. See Overview section for details.

Planning for disclosure protection from the project's start

Planning for disclosure protection from the project's start

Removing personal identifiers from data can take significant effort. Even from the project proposal stage, it is important to decide whether significant data de-identification will be required if sharing data, and how to protect against identifier disclosure even when gathering and using data internally. Here are some tips and methods for more efficiently integrating identifier protection throughout the research process.

Proposal Data Management Plans

Proposal Data Management Plans

Prepare a Data Management Plan (DMP) for sharing and/or de-identifying data, storage, and documentation. Many funders have data sharing policies that require DMPs for grant proposals or after awards. Proposal Data Management Plans should state whether de-identified data will be shared or provide justification for not sharing such data. Sharing data w/ identifiers is not required, but should be justified:

Sample justification text: “Subject identifiers cannot be adequately removed from all interview transcripts. Selected de-identified sections will be shared.”

Guided web forms for creating DMP's for all funders can be found at JHU Data Services can provide feedback and direct guidance on preparign plans.

Even if DMP's aren't required for a proposal or awarded project, all projects should consider developing a data management plan, especially if data will be accessed by external collaborators or shared publicly through a data repository.  A DMP should cover: 

  • What data are produced during the research workflow, with awareness of special formats and metadata documentation standards relevant to sharing.
  • How and where data are stored and backed up (e.g., copies to a secure drive and an offsite backup to a cloud server.)
  • What data will be shared, where, and under what conditions (including plans for de-identification and depositing to restricted or public data repositories).
  • Procedures for organizing and tracking files and documenting procedural changes during research.

Budgeting for disclosure protection: Some funders allow requests for funds to cover costs of preparing de-identified datasets. Disclosure protection for smaller restricted datasets may only require in-house resources; however, large datasets developed for public access may require hiring statisticians trained in the techniques, and/or external services for disclosure risk assessment and mitigation.

IRB and Disclosure Protection Plans

IRB and Disclosure Protection Plans

Plans for protecting identifiers, especially for publicly accessible data, should be specified in IRB Forms and Subject Consent forms.

IRB Research Plans detail collection, protection, sharing, and disposal of data with identifiers and should address:

  • What identifiers will be collected
  • How will you protect the rights and privacy of human subjects both during and after the study
  • Who can use data and under what conditions
  • How long will data be retained. Are identifiers kept or how will they be disposed of?
  • What data will be shared: peer to peer, or via a public or restricted repository
Plans to share medical and health data, even with external collaborators under restricted access, should be clearly stated, as well as plans for restricting or de-identifying data. IRB may ask for additional information, which can include submitting a Data Security Checklist, or a Data Security Profile, depending on whether preferred JHU-managed secure data access platforms will be used. School of Medicine IRB will indicate whether the study involves oversight by the JHM Data Trust Council regarding plans for accessing and sharing data from JHM sources [see next section and section on Secure Storage Options].

Subject consent forms should state intentions to share data, whether for public or restricted access. If research subjects do not know data will be shared, IRB may not approve releasing the data. HIPAA's "expert-level" de-identified data may not require direct consent, because links to subjects have been removed, but only if fully de-identified and approved by IRB and/or the JHM Data Trust Council for release. Here is some sample consent form language:

State where de-identified data will be share Selected data from this study will be deposited into [name repository], a publicly-accessible database for research data. We will remove all identifiers linked to you before the data are deposited.
Inform them of the small risk: There is a very small chance that someone with access to the research data or results could identify you through external information  sources. JHU researchers have policies and practices in place to minimize any risk of indirect disclosure of your personal information.
Optional: allow opt out of sharing de-identified data I do not wish to have anonymous transcripts shared online or through the [named repository] for further research or educational purposes, even though there is a low risk that I can be identified by the information released.