Protecting Human Subject Identifiers
- Sheridan Libraries
- Guides
- Protecting Human Subject Identifiers
- Intro to Advanced Techniques
Data Services Profile
We are here to help you find, use, manage, visualize and share your data. Contact us to schedule a consultation. View and register for upcoming workshops. Visit our website to learn more about our services.
Advanced techniques overview
When to consider advanced de-identification techniques
Consider more advanced de-identification techniques when:
- preparing a public access dataset
- de-identification removes too many key variables of interest
- preparing complex quantitative datasets, especially for large samples with multiple variables
Here is an overview of a few of the common advanced techniques:
Collapse categories with low frequencies (low p value). Make broader ranges by creating a broader coding scheme.
“Collapsing Data across Observations | SPSS Learning Modules.” n.d. Accessed March 5, 2019. https://stats.idre.ucla.edu/spss/modules/collapsing-data-across-observations/.
Date Shifting If compete numeric dates are needed for calculation purposes, rather than truncating them to a year, use date shifting of plus or minus 182 days (equivalent of truncating to a year. Shifts of less than a year may still be considered direct identifiers.) Dates may be shifted by a fixed amount, but randomized amounts are generally more secure. For participants with multiple events, a random shift value may be assigned to each participant.
Date shifting: Hripcsak G, Mirhaji P, Low AF, Malin BA. Preserving temporal relations in clinical data while maintaining privacy. J Am Med Inform Assoc. 2016;23(6):1040-1045. doi:10.1093/jamia/ocw001
Top and bottom coding: Change extreme top & bottom of outlier variables
“Top-Coded.” 2018. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Top-coded&oldid=870775236
Microaggregation: Group 3-5 similar records for a problematic (non-range) variable, and replace original value by the means of those records.
Domingo-Ferrer, Josep. 2009b. “Microaggregation.” In Encyclopedia of Database Systems, edited by LING LIU and M. TAMER ÖZSU, 1736–37. Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-39940-9_1496.
Record swapping: For categorical variables, from a random sample of at-risk records, swap values among closely paired participants, to add "noise" increasing anonymity of the overall dataset.
Domingo-Ferrer, Josep. 2009a. “Data Rank/Swapping.” In Encyclopedia of Database Systems, edited by LING LIU and M. TAMER ÖZSU, 620–21. Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-39940-9_1497.