The Sheridan Libraries

Protecting Human Subject Identifiers

Introduction to concepts and basic techniques for disclosure analysis and protection of personal and health identifiers in research data for public or restricted access, following applicable JHU data governance policies. See Overview section for details.

Data Services Profile

We are here to help you find, use, manage, visualize and share your data. Contact us to schedule a consultation. View and register for upcoming workshops. Visit our website to learn more about our services.

Advanced techniques overview

When to consider advanced de-identification techniques

Consider more advanced de-identification techniques when:

preparing a public access dataset
de-identification removes too many key variables of interest
preparing complex quantitative datasets, especially for large samples with multiple variables

Advanced de-identification is not always a "do-it-yourself" activity. If your project requires advanced de-identification, a professional statistician may be required, and approval of datasets for disclosure risk before public release. Investigators and researchers should ensure that they follow data governance policies and procedures that apply to their data. Medical and health research subject to oversite by the JHM Data Trust Council must follow additional requirements for de-identification and disclosure review. Please refer to the JHM Data Trust Council section of this guide .

Here is an overview of a few of the common advanced techniques:

Collapse categories with low frequencies (low p value). Make broader ranges by creating a broader coding scheme.

“Collapsing Data across Observations | SPSS Learning Modules.” n.d. Accessed March 5, 2019. https://stats.idre.ucla.edu/spss/modules/collapsing-data-across-observations/.

Date Shifting If compete numeric dates are needed for calculation purposes, rather than truncating them to a year, use date shifting of plus or minus 182 days (equivalent of truncating to a year. Shifts of less than a year may still be considered direct identifiers.) Dates may be shifted by a fixed amount, but randomized amounts are generally more secure. For participants with multiple events, a random shift value may be assigned to each participant.

Automated date shifting in REDCap: For data that needs to be shared from a study in a REDCap database, there are several de-identification tools within the Data Export Tool, including automatically shifting date fields by 365 days, meeting HIPAA Safe Harbor requirements. Instructions are on the REDCap Resource Centre's Export instructions [linked here]. Be sure to check results before sharing. If certain users accessing REDCap directly are only allowed to see shifted date fields, a calculated date shifted field could be created manually with these instructions for adding and subtracting dates after first creating a new field that randomly assigns + or - 182 days to each participant's record, and calculates the amount to shift by referring to that field.

Date shifting: Hripcsak G, Mirhaji P, Low AF, Malin BA. Preserving temporal relations in clinical data while maintaining privacy. J Am Med Inform Assoc. 2016;23(6):1040-1045. doi:10.1093/jamia/ocw001

Top and bottom coding: Change extreme top & bottom of outlier variables

“Top-Coded.” 2018. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Top-coded&oldid=870775236

Microaggregation: Group 3-5 similar records for a problematic (non-range) variable, and replace original value by the means of those records.

Domingo-Ferrer, Josep. 2009b. “Microaggregation.” In Encyclopedia of Database Systems, edited by LING LIU and M. TAMER ÖZSU, 1736–37. Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-39940-9_1496.

Record swapping: For categorical variables, from a random sample of at-risk records, swap values among closely paired participants, to add "noise" increasing anonymity of the overall dataset.

Domingo-Ferrer, Josep. 2009a. “Data Rank/Swapping.” In Encyclopedia of Database Systems, edited by LING LIU and M. TAMER ÖZSU, 620–21. Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-39940-9_1497.

Last Updated: Oct 29, 2025 3:52 PM
URL: https://guides.library.jhu.edu/protecting_identifiers
Print Page