Bringing Responsible Use of AI to Patient Advocacy Groups


Hardly a day goes by that AI isn’t mentioned in a news report or conversation on how technology is affecting our lives. This is also true for discussions about healthcare advancements, from cancer therapies and precision medicine, to simply being able to serve patients better in the limited time doctors have before their next appointment. Here we share how Luna is building partnerships and delivering the benefits of AI to individuals, group leaders, and researchers, always doing so while protecting people’s data rights. AI is already powering Luna applications that can improve health research, including automated information extraction to standardize research data and AI model training with information that reflects the diversity of communities. As with all innovation, companies must proceed cautiously to preserve the advances and guard against the risks.

Information Extraction Using AI

One of clearest use cases for AI in health research is extracting relevant information from unstructured medical and clinical records, and structuring it into a format that is conducive to research and analysis. For example, the Luna platform has AI technology built into its ability to extract information from genetic test reports and store it in a standard format for use in health research. AI replaces the need for a researcher to scroll through long reports that often have different formats and test naming conventions. This also supports members’ ability to share their genetic information in the studies they join so that researchers can advance therapeutic development. As with all data shared on Luna, genetic data extracted with the assistance of AI is fully under the control of the member.

Community Data Improves AI Models

A significant hope of AI is that it can be used to predict how individuals will respond to various treatments or procedures, perhaps even predict if they will develop a disease. In order to reach this promise of AI, researchers must first train models and then test them using data from outcomes of patients with various diseases as well as future diagnoses of patients without disease.

As Luna has previously written, a critical consideration of AI-powered research is the quality and completeness of data used to build or “train” an AI model. If the data used to train the AI model are more representative of some groups of people than others, the predictions from the model may also be systematically worse for unrepresented or under-representative groups. It’s well documented that existing health databases lack diversity due to poor inclusion of underrepresented populations. Luna addresses this by going after a major issue in research participation – trust – by enabling participant control over whether or not their data is used in research or in the development of new AI models.

Communities organized on Luna have seen strong representation of diverse participants. With inclusive representation, Luna community leaders are already drawing the attention of researchers and industry looking to leverage AI.

Luna’s AI Partnerships

Luna is excited about the potential of AI to improve and accelerate many aspects of health and personal well-being if deployed responsibly. To further develop and promote the coupling of ethical review with AI technology and data control, Luna has become a founding member of a soon-to-be announced AI consortium focused on health data research and health technology development. This consortium will operate with an international perspective concerning health data use. As a founding member, Luna can shape policy developments that will be incorporated into future health research AI usage guidelines and governance.

Stay tuned for this announcement!

The Importance of Ethical Innovation

Society will benefit from innovation that is ethical and aligned with the values of communities. AI innovation is no exception, and has already revealed its share of challenges and concerns. We see the review of studies by institutional review boards (IRBs) or research ethics committees, focused on protecting individuals and ensuring ethical acceptability, along with the individual’s control over the use their data, as a powerful mechanism to ensure that the AI models developed are consistent with the values of the member and our society at large. We also see communities organized on Luna as playing a critical role in accelerating the best of what AI technology can deliver by ensuring patients and community leaders play a role as partners in innovation.


About Luna

Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data—health records, lived experience, disease history, genomics, and more—for research.

Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.

By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.


data rights

Breaking the Mental Model: Individual Data Control Can Deliver Better Research


The majority of individuals on Luna want to accelerate research and ensure their data is used as they allow. We considered a recent article in Forbes, and we broke down two recent legal opinion articles on medical data privacy and rights when it comes to your data’s application in research (article links below).

As standard practice in US healthcare, laboratory results, doctor’s conclusions, and any other information collected during your virtual or in-person visit is digitally captured and stored for later reference by the healthcare provider. This information is protected under the Health Insurance Portability and Accountability Act, referred to as HIPAA, to protect your private information from disclosure to parties outside of your care team. There are provisions under HIPAA for the de-identification of health data (which is simply the removal of your name, address, and other information that would clearly link the data back to you) so it can be shared freely for health research purposes – so-called secondary use of health data. Some types of health data, such as DNA information that may be collected to make treatment decisions, are inherently challenging to de-identify, and some argue impossible, despite their significant utility for research.

The balance between research benefit (i.e., the advancement of knowledge to guide improvements in diagnosis and treatment of diseases) and the role that individuals play is evolving. Many of the contemporary data protection and privacy laws around the world such as Europe’s General Data Protection Regulation (GDPR) and California’s Privacy Rights Act (CPRA) are built upon HIPAA and Fair Information Practice Principles (FIPPs) from the 1970s to define a right for individuals to control the use of data that is collected from them. And while this right to have control over the use of one’s data is absolute, the intersection between secondary use of de-identified data and the control granted by privacy legislation needs to find common ground for health data from all peoples to be included for research to have representation from the widest range of backgrounds possible.

As it pertains to the secondary use of health data, a case can be made that shifting the control of data use from institutions to individuals provides a direct pathway to greater study participant engagement and more inclusive participation of individuals in future research studies.

The debate on this intersection of approaches is couched in terms of data ownership and control of data use. Unlike many other tangible assets like real estate or a piece of furniture, data can be used simultaneously by many parties without degrading the value of each party’s use of the data. This difference has shifted thinking to consideration of the control of data use (i.e., rather than data ownership) to be of paramount importance. And moreover, the trend globally and increasingly at the State level in the US is that the control of data use should rest with the individual on whom the data was collected. This argument is most compelling when considering an individual’s DNA data that uniquely characterizes them. As it pertains to the secondary use of health data, a case can be made that shifting the control of data use from institutions to individuals provides a direct pathway to greater study participant engagement and more inclusive participation of individuals in future research studies.

Articles Reviewed for this Blog

“The Future Of Personally Identifiable Information And Health Data”
https://www.forbes.com/sites/forbestechcouncil/2023/07/18/the-future-of-personally-identifiable-information-and-health-data/?sh=694704622468

“Data Unlocked: Why Rights Mean More Than “Ownership” in B2B Data Sharing”
https://gowlingwlg.com/en/insights-resources/articles/2023/data-unlocked-rights-over-data/

“Ensuring Data Privacy in Genomic Medicine: Legal Challenges and Opportunities”
https://www.jdsupra.com/legalnews/ensuring-data-privacy-in-genomic-8975727/


About Luna

Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data—health records, lived experience, disease history, genomics, and more—for research.

Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.

By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.


data rights

De-identified, Pseudonymized, Anonymous Data, Oh My!


It seems like everywhere we turn these days some aspect of data privacy is in the news with this or that company sharing your data in some form or fashion. Amongst many of these reports are the use of your de-identified data. What is de-identified data and how is it different from pseudonymized or anonymous data? And how do any of those relate to your personal data/information covered by modern data privacy regulations?

De-identification removes features like your name, address, and date of birth from your data. It is reversible if those accessing your de-identified data have enough other information that can be tied to the remaining details in the de-identified data. Think of this like pixels in an image. With enough pixels, the full image comes together, even if some pixels are missing.

Pseudonymization replaces certain pieces of information in your data set – for example associating your data with a unique ID in place of your name or address. This is also reversible if those with access to your data have enough other information (or have access to the key or decoder that connects your name back with that unique ID).

Anonymization is NOT reversible which means that, in addition to removing your name, address, date of birth, zip code, and so on, other information such as medical diagnoses, job title, and/or geolocation must also be removed.

So, what about DNA data? Everything stated here certainly suggests that DNA information about you that is large enough (e.g., your entire genome sequence) or specific enough (e.g., gene variations that led to a medical diagnosis) could never be considered anonymous. This is why DNA is used in applications ranging from family finder tools to crime scene investigations.

According to many data privacy regulations, de-identified data is likely still considered your personal data/information and you have the right to know how it is being used and prevent it from being used for purposes you don’t agree with, if you choose.

Data privacy regulations vary based on where you live. Some country or state-level data privacy regulations consider your data as personal information unless it has been anonymized. Others only require de-identification or de-identification PLUS defined additional steps (sometimes many such steps!) to help prevent re-identification so it’s no longer considered your personal data.

Yes, this is all a bit confusing and constantly evolving. So, when you see news articles bandying about a company selling access to “de-identified” data that is no longer in the control of you – the individual the data represents – it should set off warning flags. According to GDPR (Global Data Privacy Regulation in Europe) and CCPR (California Privacy Rights Act) and similar US and non-US data privacy regulations, de-identified data is likely still considered your personal data/information and you have the right to know how it is being used and prevent it from being used for purposes you don’t agree with, if you choose.


About Luna

Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data—health records, lived experience, disease history, genomics, and more—for research.

Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.

By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.


Interconnected points

Demystifying Artificial Intelligence (AI)


By Sanjay John, a software engineer, and Scott Kahn, PhD, the Chief Information and Privacy Officer, at Luna.

With the recent launch of ChatGPT, suddenly every tech company has artificial intelligence (AI) capabilities. News stories everywhere are expounding on the promise and threat of AI and its family of applications including machine learning (ML) and large language models (LLMs). But are these technologies really that new? And what is the truth buried in the confusing technical jargon on which most stories focus? Tune in while we try to demystify AI and related applications.

Large Language Models

The fields of AI and ML are over 70 years old. At their foundation is the mathematics of probability and statistics. LLMs, like ChatGPT, are simply a collection of equations that determine the estimated probability of an answer to a given question. The similarity with known answers can define these probability equations. Consider a model to identify a cat in an image. A model like this “learns” from a large number of cat pictures. However, if the set of cat pictures only shows white cats or only shows wild cats, the model may be faulty and return incorrect answers that do not match the goal of creator. The same model can also be quantitative to represent the likelihood that an answer is present, for example, a model that returns the percent likelihood that there is a cat somewhere in an image. And while there are other models – such as a model that would identify if an image would make a cat-lover happy or sad – and many different types of (machine) learning to train models – these are more technical than we will explore today.

Neural Networks

Now let’s tackle neural networks. As the name implies, neural networks try to emulate the complex reasoning of a human brain. To create a neural network, we must build the instructions and logic that allow for this more complex reasoning to occur. First, we build the instructions using a class of algorithms. Algorithms are specific, unambiguous rules that instruct the model in how to react when presented with external data. Algorithms combine multiple models or “nodes” using a weighting scheme – for example, an answer derived from 50% of one model, 30% of another model, and 20% of a third model to create neural networks.

Neural networks often have many nodes (thousands or even billions) and many combinations of these nodes to present an answer to a question. One version of a neural network, known as a generative adversarial network (GAN), pits a generator network (a network focused on creating fake data) against a discriminator network (a network focused on determining if a piece of data is real). These networks have become famous for their ability to create seemingly realistic images, videos, and text. A more complex version of a neural network is a transformer. Transformers learn context and meaning by tracking relationships within data points, like words in a sentence. For example, the sentence, “The cup was poured into the bowl until it was empty,” compared to, “The cup was poured into the bowl until it was full,” shows how our complete understanding of sentences influences how we consider the meaning of the word “it” in these contexts. Transformers can decipher and apply this context, allowing for better prediction. ML and feedback loops help networks learn and adjust the weights of the various nodes accordingly.

Natural Language Processing

The final piece of the puzzle involves natural language processing, in which a model converts common written or verbal language into the meaning of the phrases. Neural networks typically perform this process, including probability models that encode the similarity of words and phrases, to predict future words and phrases. Combining the processing power of transformer networks, the creative ability of generative networks, and the large available dataset of the internet and/or databases, we arrive at LLMs (large language models). LLMs are at the cutting edge in their understanding of natural language. Unfortunately, the data sets used from the internet and other databases are often unreliable and incomplete, which again, can cause the output to be biased, misleading, and sometimes completely wrong. Meaning AI, ML, and LLMs are still only as good as the attention the creator pays to ensuring the applications learn from valid and representative data sets and that their learning feedback loop incorporates novel data over time and not just a regurgitation of the data they’ve already consumed. The better creators are at monitoring this, the more useful current and future tools using these applications will be.

Unfortunately, the data sets used from the internet and other databases are often unreliable and incomplete, which again, can cause the output to be biased, misleading, and sometimes completely wrong.

Let’s take ChatGPT, for example. It is the marriage of a powerful LLM with predictive neural network models that can learn from user input. However, it has limitations rooted in the information used to create or “train” the underlying models and the user feedback used to reinforce them.  The resulting models will reflect these gaps if the data used to train the models is not comprehensive. For example, if the model used health information strictly from men 21 years or older, you would not be able to use that model to characterize women’s health, or even boys’ health. Further, today’s health data sets typically lack representation of many individuals beyond those of European descent.

The Takeaway

So, while the headlines are provocative, AI, ML, and LLMs are just tools. Like most tools, they work best when the user knows which jobs they are most suitable for, and where the boundaries and risks lie. At Luna, we focus on using AI to assist researchers with the extraction of clinically relevant information from data that our members share in studies they join. The broader the health experiences of our members, the better these tools become in understanding what is important to help drive research faster and with more successful outcomes. At the end of the day, human intelligence and experience still reign supreme, as we decide where and when to apply these technologies, where they fall short, and when to unplug them.


About Luna

Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data—health records, lived experience, disease history, genomics, and more—for research.

Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.

By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.


A Case for Progressive Institutional Review Boards


Innovation is a constant topic in the biomedical research space. The pace of new tools, new techniques, and new discoveries is often hard to keep up with. Researchers and study participants expect innovation to deliver improvements as well as compatibility with new science, new methodologies, and new priorities, all in a constantly changing, complex environment for study participants.

This progress is almost always seen in the insights and outcomes of the research, but not in how the research is designed, begins, progresses, and is managed. Anyone who has done work in the biomedical research space knows that ethical oversight – whether by internal reviewers, external Institutional Review Boards (IRBs), or Ethics Review Boards (ERBs), collectively known as “reviewing bodies” – constitutes a significant part of all study construction and, therefore, is a significant part of all new science and discoveries. Shouldn’t the processes that enable study construction also be innovative and capable of adapting to new science, new methods, and new priorities?

A careful balance must always be maintained between ensuring protection of individuals participating in research and supporting innovative methods and processes to drive more efficient and cost-effective reviews and research.

The often broad, non-specific nature of regulations that govern “human research protections”, as they are called in the US, leave reviewing bodies somewhat adrift in what amounts to a vacuum. This vacuum forces them to make their own interpretation and implementation of the exact protections to uphold and leaves them without a mechanism to cross-communicate new processes for reviews or validations of new methods for research. At their core, reviewing bodies are simply a group of people working together to determine if a research study meets certain requirements and thresholds as interpreted from these regulations. The social psychology of how people interact on a larger social scale, how people make decisions in isolation and in aggregate, and how the context of their engagements changes their behavior in those different settings is sufficiently understood today. Yet, there is a large gap in transforming that understanding into the application of how reviewing bodies function. These factors have created a space that is prone to stagnation rather than innovation, languishing as a relatively static space for decades. The same study reviews are taking place, and the same study constructions are expected, with few changes permitted, let alone the ability to include entirely new methods. How do we inject innovation into such an environment to create what amounts to an adaptive or progressive IRB?

A careful balance must always be maintained between ensuring protection of individuals participating in research and supporting innovative methods and processes to drive more efficient and cost-effective reviews and research.

“At the Genetic Alliance IRB, we’ve found that an open dialogue between the IRB members, researchers, and technology providers is essential to understand pain points and inefficiencies in our process, and incorporate new techniques that improve both our processes and the research being done. We cannot limit ourselves to past methodologies, given the pace with which new technology is developing worldwide,” says Chris Carter, chair of the Genetic Alliance IRB.

IRBs and ERBs typically do not interact with service providers on the underlying technology by which data is collected for a study. Instead, their focus is on the burden of participation, the ethical considerations of what is being asked, and the privacy, burden, and safety risks to participants. However, we have seen a marked increase in the efficiency of designing and reviewing new studies when the IRB works in conjunction with the technical service provider to understand and approve the underlying mechanisms and processes first. By utilizing standard IRB practices to review and approve new processes and methodologies leveraging newly established technology supporting the research, we can increase the speed of research by reducing the complexity of new study design. Working with the Genetic Alliance IRB, we set out to find ways where an IRB and technology provider could work together to innovate reviews and research. Here’s what we did.

Case Study #1: Luna Platform – Simplifying the IRB Process for New Studies

“For several decades, standing in the shoes of research participants, I have been outraged at the delays and extra work some IRBs cause in the name of protecting the participants. From my perspective, many IRBs act to protect an institution rather than the people. In the case of advocacy organizations and communities working to accelerate research on their condition, making this as simple as possible and ensuring conduct of the best research for their communities is paramount,” said Sharon Terry, CEO of Genetic Alliance. “Working with LunaPBC, we did a first-of-its-kind submission to the Genetic Alliance IRB to approve an entire platform based on the methods it encompasses for participant engagement and retention, data collection, and analysis. This paves the way for a streamlined review process for any study being executed using the platform. There is no need for redundant approvals or long timelines.”

The Luna platform is a tool for researchers to establish new communities or cohorts to collect data and perform analyses for their studies in a continuously participant-connected environment. The platform has IRB approval, which includes a consent individuals e-sign to share de-identified data for research purposes on the platform. This allows CDI to be deployed on Luna by groups for the benefit of their community through a simple Organization-Specific CDI Addendum. Then study-specific consents are only needed for those studies that go beyond the standard methods described above for the platform and enables the IRB to focus on specific goals of and populations included in the research, since the underlying mechanics are the same from study to study. Examples of protocols that are not covered by the approval are 1) the collection of personally identifiable information instead of de-identified data, and 2) use of bespoke instruments on topics not prioritized by the community through Community Driven Innovation.

Case Study #2: Community Driven Innovation – Innovative Methods for Research
Luna established the Community Driven Innovation™ (CDI) method for uncovering and validating the top priorities of a community or cohort, similar to methods like the Delphi technique, but without the disadvantage of inherent groupthink or expert bias those other techniques often introduce. By working collaboratively with the Genetic Alliance IRB and several researchers using CDI for their research goals, we were able to identify new ways to not only improve the research being done, but also simplify certain aspects of the IRB process itself.

 “The Genetic Alliance IRB immediately understood the value of the CDI method, not only in reducing burden on study participants, but also in reducing costs and time for their own reviews for both new studies using CDI and new data collection, typically surveys, being designed based on the insights from CDI,” shared Ian Terry, senior user experience research at Luna.

Together, we established two concepts that were integrated into the Luna platform protocol with the Genetic Alliance IRB: (1) A CDI “meta-study” design, and (2) “Related Topics” surveys.

CDI Meta-Study Design

We established a protocol that defines the CDI method leveraging the mechanisms on the Luna platform for consent, recruitment, data collection, and data analysis. New deployment of CDI can be added via an addendum to this protocol defining the specific populations and key personnel involved in recruitment since nothing else changes from CDI to CDI. This streamlines the time for review and cost to review by the IRB, and enables researchers with less scientific experience to take advantage of deploying a CDI to their communities or cohorts.

Related Topic Surveys

Building off of the CDI method protocol, we worked with the Genetic Alliance IRB to design a process to eliminate the need for researchers to submit all research questions during the initial study design and instead enable evolving content based on the insights generated from the CDI itself. Together, we established thresholds for specific topics and priorities uncovered by the CDI that could be turned into questions in follow-on surveys without requiring additional IRB reviews. We’ve now opened up the research such that the involved patient population can select the research topics using several well-documented methods from Computational Social Choice theory. By doing this, we don’t have to pre-decide what a specific population may need; we can build the very task of asking that question into the study design. This allows us to remove many steps that would have been spent attempting to figure out what the population wants and thereby speed up the time it takes to establish this study in the first place. But also gives the research population – versus the experts or researcher – the autonomy to decide where the research should be headed, a power that has been missing in the medical world for a long time.

These are only a couple of examples of how collaborative exploration of new processes, methods, and technologies can create an innovative and efficient environment for safe, ethical research. By focusing on technology and method innovation in our external and internal ethical reviews, we can explore new frontiers in the research that empower participants to help drive study design. Subsequently, elevating the participant to drive the study design ensures that the new inventions and products developed meet their needs and pain points directly, thereby expediting answers, time- to-market, and ultimately better health. We’ve turned a top-down study process into a dynamic ecosystem of iterative listening, accessible to non-scientists, that supports the privacy and safety of its members.

Learn more about Community Driven Innovation at www.lunadna.com/cdi


About Luna

Luna’s suite of tools and services connects communities with researchers to accelerate health discoveries. With participation from more than 180 countries and communities advancing causes including disease-specific, public health, environmental, and emerging interests, Luna empowers these collectives to gather a wide range of data—health records, lived experience, disease history, genomics, and more—for research.

Luna gives academia and industry everything they need from engagement with study participants to data analysis across multiple modalities using a common data model. The platform is compliant with clinical regulatory requirements and international consumer data privacy laws.

By providing privacy-protected individuals a way to continually engage, Luna transforms the traditional patient-disconnected database into a dynamic, longitudinal discovery environment where researchers, industry, and community leaders can leverage a range of tools to surface insights and trends, study disease natural history and biomarkers, and enroll in clinical studies and trials.