Technology

Discover scalable, instrument-free single cell sequencing technology from Parse Bioscience

Technology Overview


Resources

Explore our collection of resources to learn more about technology and its applications from leading researchers

Resources Overview

Company

Providing researchers single cell sequencing with unprecedented scale and ease

About Parse

From Blurred to Precise: scRNA-Seq Brings Clarity and Direction to legacy GWAS data

September 10, 2025
|
6 min read
Updated:September 11, 2025

 

Between 2005 and 2010, Genome Wide Association Studies (GWAS) were a key scientific approach to understanding the variability in individual genomes and between populations. Various consortia had identified more than 3,000 unique loci associated with over 250 diseases. By 2020 there were about 5,000 unique loci for 55,000 diseases or traits, with summary statistics publicly available in numerous data portals.

GWAS identify single nucleotide polymorphisms (SNPs) statistically associated with diseases or traits across millions of people. These markers can highlight regions of the genome that may influence disease risk and can guide drug discovery, including repurposing of approved therapeutics. This has the potential to make clinical trials more successful, faster, and cheaper as the discovery process is grounded in human genetics.

But GWAS alone do not reveal which gene is affected or how gene variants affect its influence and mechanisms. About 90% of GWAS-identified SNPs fall within non-coding regions of the genome that don’t code for proteins, but often play key roles in regulating gene activity.

To pinpoint the mechanisms, researchers integrate GWAS with gene expression studies like expression quantitative trait locus (eQTL) analysis to measure how genetic variation influences the mRNA produced from a gene, either by directly affecting its coding sequence or by altering regulatory elements that control its expression. Correlating these effects across many individuals reveals the regulatory architecture of the genome, links genotype to phenotype, and shows how genetic variation drives disease risk.

Sequencing technologies further power these insights. Bulk sequencing averages signals across tissues, while newer single-cell sequencing reveals exactly which cell types are influenced, with the granularity needed to understand disease biology and design precise therapies.

The Evolution (and Revolution) of Sequencing Technologies

RNA sequencing (RNA-seq) technology was first introduced in 2008; it analyzes the gene expression of a sample providing a view of all RNA transcripts present in the cells. After extraction, the RNA is reverse-transcribed into cDNA. Adaptors are ligated to the ends of the cDNA to create a sequencing library which is then amplified to generate sufficient material for sequencing.

The starting material is RNA extracted from a heterogeneous mixture of cells, so the data represent an average gene expression profile across hundreds of thousands to millions of cells. As a result, an observed increase in the expression of a gene associated with a SNP may reflect upregulation in any subset of cells, without revealing which specific cell types are responsible. This averaging effect obscures whether the altered expression occurs in the cell types most relevant to the disease under study, potentially masking key cellular drivers of pathogenesis.

Single cell RNA sequencing (scRNA-seq) overcomes this limitation by enabling transcriptome profiling at single cell resolution.

In scRNA-seq techniques, tissue samples are dissociated into a single cell solution. The mRNA within the cells is then barcoded with unique molecular identifiers (UMIs).

How individual cells and their associated transcripts are barcoded differs across scRNA-seq technologies. In combinatorial barcoding approaches like Evercode(™), for instance, the cells are fixed and permeabilized so they become the reaction chamber themselves. Poly(dT) primers initiate reverse transcription at the 3′ ends of polyadenylated mRNAs, and random primers bind to all RNA species, including non-coding RNAs.

This dual-priming strategy ensures unbiased transcript coverage and a clear advantage in GWAS studies where 90% of the SNPs are in the non-coding regions of the DNA. A split-pool barcoding process then labels cells with an exponentially large number of barcode combinations.

Regardless of the technology, by integrating scRNA-seq with GWAS and eQTL data—known as single cell eQTL (sc-eQTL) analysis—researchers can uncover how genetic variants affect gene expression in specific cell types or states. This refinement allows for precise mapping of the cell-type-specific regulatory architecture underlying disease-associated loci.

What Bulk Missed, Single-Cell Revealed: The True Role of FTO Locus

For years, GWAS were combined with bulk RNA-seq, enabling the detection of expression quantitative trait loci (eQTLs) and establishing statistical links between SNPs and gene expression changes in disease-relevant tissues. While impactful, this approach often lacked the resolution needed to fully elucidate cellular mechanisms and therapeutic targets.

For instance, a GWAS in 2007 linked variants of the gene FTO to obesity. Variants in FTO were strongly associated with body mass index, but the affected region was non-coding, and no one knew how it influenced adipose tissue biology.

Through a series of studies, researchers discovered that these variants don’t affect FTO itself but instead disrupt long-range enhancers that regulate the nearby genes IRX3 and IRX5. These transcription factors control the fate of preadipocytes — the precursors to fat cells.

Later, single cell transcriptomic and epigenomic studies of human adipose tissue showed that IRX3/IRX5 are specifically expressed in progenitor cells and drive the differentiation toward white (energy-storing) rather than brown (thermogenic) adipocytes. This insight rewrote the functional interpretation of the FTO locus: the obesity risk allele suppresses thermogenesis not by altering metabolism directly, but by shifting the cell fate landscape during adipogenesis.

Filling the Gaps with Granular Resolution

While thousands of genetic variants have been identified and linked to alterations in protein function associated with disease, the precise mechanisms by which these variants contribute to the development of many conditions are still largely unclear. Transcriptomic research is now catching up.

In this example, researchers reanalyzed human pancreatic islet tissue using scRNA-seq, focusing on the transcriptional profiles of insulin-producing beta cells. By integrating these single-cell data with GWAS signals and eQTL mapping, they made a striking discovery: in individuals with type 2 diabetes, beta cells begin to lose their identity, adopting transcriptional programs normally seen in alpha cells. What makes this finding remarkable is that it links a GWAS-discovered variant directly to a cell-state–specific regulatory mechanism—a relationship that would have been invisible in bulk-tissue analysis. Only through the resolution of scRNA-seq could the researchers observe this subtle, disease-associated shift in transcriptional identity within a subset of pancreatic beta cells.

An Atlas of Single Cell eQTLs

In a landmark study published in Cell Genomics, scientists constructed a single cell eQTL atlas from immune cells collected from a genetically diverse population. By profiling gene expression in T cells, B cells, monocytes, dendritic cells, and other immune subsets, they directly mapped disease-associated GWAS variants to the exact immune cells in which they exert regulatory effects.

This cell-level resolution allowed researchers to show that a variant linked to rheumatoid arthritis modulates gene expression mostly in the CD4 memory T cells and immature B cells. Similarly, they identified an autophagy-related gene as a causal gene for ulcerative colitis risk with the strongest effect in the NK cells, which are an important component of the intestinal mucosa immune system with an increasingly recognized role in the disease. These insights could not be captured with bulk RNA-seq.

By resolving the immune system into its cellular constituents, the authors redefined how we understand immune-mediated genetic risk, tracing it to specific actors in the immune landscape that can now be targeted.

Conclusions

The ultimate goal of GWAS and post-GWAS analysis is not just to map variants to genes, but to uncover causal mechanisms that can be translated into diagnostic tools, drug targets, or personalized therapies.

The examples described underscore how the addition of scRNA-seq technology to detect mechanisms invisible in bulk data is more than just a technical advance, but it changes the interpretive power of GWAS and it lays the groundwork for a new era of cell type–targeted therapies grounded in genetic evidence.

About the Author

Laura Tabellini Pierre

Laura Tabellini Pierre, MSc, is a scientific and technical writer at Parse Biosciences with extensive experience in immunology, encompassing both academic and R&D research.
English