It seems like everywhere we turn these days some aspect of data privacy is in the news with this or that company sharing your data in some form or fashion. Amongst many of these reports are the use of your de-identified data. What is de-identified data and how is it different from pseudonymized or anonymous data? And how do any of those relate to your personal data/information covered by modern data privacy regulations?
De-identification removes features like your name, address, and date of birth from your data. It is reversible if those accessing your de-identified data have enough other information that can be tied to the remaining details in the de-identified data. Think of this like pixels in an image. With enough pixels, the full image comes together, even if some pixels are missing.
Pseudonymization replaces certain pieces of information in your data set – for example associating your data with a unique ID in place of your name or address. This is also reversible if those with access to your data have enough other information (or have access to the key or decoder that connects your name back with that unique ID).
Anonymization is NOT reversible which means that, in addition to removing your name, address, date of birth, zip code, and so on, other information such as medical diagnoses, job title, and/or geolocation must also be removed.
So, what about DNA data? Everything stated here certainly suggests that DNA information about you that is large enough (e.g., your entire genome sequence) or specific enough (e.g., gene variations that led to a medical diagnosis) could never be considered anonymous. This is why DNA is used in applications ranging from family finder tools to crime scene investigations.
Data privacy regulations vary based on where you live. Some country or state-level data privacy regulations consider your data as personal information unless it has been anonymized. Others only require de-identification or de-identification PLUS defined additional steps (sometimes many such steps!) to help prevent re-identification so it’s no longer considered your personal data.
Yes, this is all a bit confusing and constantly evolving. So, when you see news articles bandying about a company selling access to “de-identified” data that is no longer in the control of you – the individual the data represents – it should set off warning flags. According to GDPR (Global Data Privacy Regulation in Europe) and CCPR (California Privacy Rights Act) and similar US and non-US data privacy regulations, de-identified data is likely still considered your personal data/information and you have the right to know how it is being used and prevent it from being used for purposes you don’t agree with, if you choose.
About Luna
Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data—health records, lived experience, disease history, genomics, and more—for research.
Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.
By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.