The profile of genetic testing–and the resulting genetic data–has been elevated in public discussions. One reason is because of the COVID-19 pandemic, but also because of an increasing focus on data privacy and the growing belief that individuals should have control of their data.

While concerns exist with the collection of consumer transactional data by Big Tech, considerations of one’s uniquely identifying genetic data–and the privacy controls applied to it–have become more focused. Unlike consumer data that can be expunged and obfuscated, genetic data describes an individual through their entire life. The impact of a data breach with genetic data can have consequences that cannot be undone.

Privacy concerns: consumer data versus genetic data

It is commonplace to securely encrypt data while it’s being stored and even to use technologies like homomorphic encryption to control access to genetic information for research purposes. Such techniques have been used to propagate the most common mode of data use in which it is downloaded onto a researcher’s computation environment. Each download of data is a separate copy that carries with it the liability that the information could be shared or hacked and used for purposes other than it was provided for under informed consent.

An alternative solution, and one that is inherently compatible with modern data privacy frameworks such as the European Union’s General Data Protection Regulation (GDPR), is to not make copies of data. Instead, the use of a computational environment, also known as a sandbox, that can access the data may be provided to each research team to perform analyses. The advent of powerful and readily available cloud-based information services has made this latter solution viable.

While concerns exist with the collection of consumer transactional data by Big Tech, considerations of one’s uniquely identifying genetic data–and the privacy controls applied to it–have become more focused.

It is also important to consider that not all genetic information carries a high potential risk to the individual. DNA data on a person’s cancerous mutations are different than the individual’s germline DNA and cannot be used to re-identify an individual. Similarly, the data on a particular variant of a virus, such as SARS-CoV-2, cannot be directly traced back to the individual from which the sample was collected. In both cases, genetic information is distinct from an individual and does not carry a risk to the individual from which it was collected.

Scott Kahn
Scott Kahn, PhD, Chief Information and Privacy Officer, Luna

Weighing risks for different types of data

The different risk aspects of different types of genetic information can be different for individuals, institutions, and governments. Whereas individuals may not be at risk of re-identification from pandemic-related DNA data, institutions and moreover governments might experience negative consequences upon disclosure of a novel variant as was seen with South Africa’s disclosure of the omicron variant.

While all public health efforts were bolstered through knowledge of omicron’s existence, the economic consequences felt by South Africa through the travel restrictions and related actions were a far cry from an expression of gratitude by the rest of the world.

About Luna

Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data — health records, lived experience, disease history, genomics, and more – for research.

Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.

By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.

Scott Kahn, Ph.D.

Scott Kahn, Ph.D.


Scott is the former CIO and VP Commercial, Enterprise Informatics at Illumina. At Luna, he’s integrating data privacy and security provisions that keep member data safe, private, and secure.