Reversal as a Drug Discovery Strategy: The Search for Perturbations That Restore Normal Cell States
Using Tahoe 100M to prioritize drugs by their ability to reverse the cancer cell state
Many diseases are, at their core, diseases of cell state. A cell that should be quiescent is proliferating. A cell that should differentiate is stuck. A cell that should signal to its neighbors has gone silent, or is shouting the wrong message. The genome may carry the initiating lesion, but the disease lives in the transcriptome: the full pattern of genes a cell turns on and off, the molecular identity that dictates its behavior.
This reframing suggests a powerful and underexplored way to evaluate drugs. Rather than asking only whether a compound kills a diseased cell or inhibits a specific target, we can ask a more fundamental question: Does this drug shift the cell’s transcriptomic identity back toward normal?
This is the principle of cell-state reversal. If you can measure what makes a diseased cell different from its healthy counterpart at the level of gene expression, and you can measure what a drug does to that same cell’s transcriptome, then you can score every drug by how effectively it closes the gap. A drug that reverses the disease signature (pushing expression back toward the healthy reference) is, in a precise and measurable sense, restoring the cell toward its normal state.
The concept is general: it applies anywhere you have a well-defined disease transcriptome and a compendium of drug-induced expression changes. But cancer is a particularly compelling place to start. Tumors are driven by well-characterized genetic lesions, i.e., mutations in oncogenes like KRAS, BRAF, and PIK3CA, that rewire gene expression in ways that have been extensively profiled in patients. And cancer is far more than uncontrolled proliferation: it involves aberrant differentiation, metabolic reprogramming, disrupted cell-cell signaling, and immune evasion. A drug that partially normalizes any of these dimensions might never register as a hit in a classical viability screen, yet could be profoundly relevant in a patient.
What’s been missing is the data infrastructure to test this idea at scale. You need a dense compendium of drug-induced transcriptomic responses across many cell lines or other cancer models and patient-derived reference profiles to define the “target”, the normal state you’re trying to recover.
Tahoe 100M provides the first two at a scale that didn’t previously exist: 100 million single-cell transcriptomic measurements spanning 60,000 drug–cell line experiments. Here, we apply cell-state reversal to colorectal cancer as a proof of concept — and show that the results are remarkably consistent with known biology, while surfacing signals that go beyond what viability assays alone can reveal.
The Logic of Cell-State Reversal
The dominant paradigm in cancer drug screening is phenotypic viability: treat cells, measure death. This approach gave rise to most classical chemotherapies and, combined with the molecular revolution in oncology, continues to power targeted drug development. Viability screens are valuable; they tell you which cellular features confer drug sensitivity.
But they capture only one axis of a drug’s effect. In vitro, this limitation is especially acute: the simplified microenvironment lacks immune-mediated killing, stromal interactions, and the metabolic context of a real tumor. A compound that restores differentiation programs, rebalances metabolism, or re-engages normal signaling architecture won’t necessarily kill cells in a dish, but it might reshape the disease in a patient.
Cell-state reversal offers a complementary lens. Instead of asking “does this drug kill the cell?”, we ask: does this drug make the cell’s gene expression look less like a tumor and more like normal tissue? This reframing opens the aperture to the full spectrum of cancer hallmarks, not just proliferation and survival, but differentiation, metabolism, and intercellular communication, and may ultimately enhance clinical translatability.
The analytical framework is conceptually simple:
First, for each drug–cell line pair in Tahoe 100M, we compute a differential expression profile (using DESeq2, relative to plate-matched DMSO controls). This captures what the drug does to the cell’s transcriptome.
Second, from patient-derived data, we compute a tumor-versus-normal differential expression profile for the relevant cancer type, defining the transcriptomic “distance” between diseased and healthy tissue.
Third, we correlate the two profiles. A strong negative Pearson correlation means the drug is reversing the cancer expression pattern, pushing the transcriptome back toward normal.
Validation: The Framework Recovers Known Biology
We applied this framework to colorectal cancer (CRC), using patient-derived epithelial cell profiles from Joanito et al. (Nature Genetics, 2022) to define tumor–normal differences for the iCMS2 and iCMS3 transcriptional subtypes. Drug-induced profiles were computed across nine CRC cell lines in Tahoe 100M.
The first test is whether drugs with known activity in CRC show stronger cancer-state reversal than the background. We defined positive control pairs: BRAF inhibitors (dabrafenib, encorafenib, vemurafenib) in BRAF-V600E cell lines, the (K)RAS inhibitor RMC-6236 in KRAS-mutant cell lines, and the multi-kinase inhibitor regorafenib across all CRC lines. RMC-6236 is not yet approved for CRC, but has shown promising clinical results in KRAS-mutated epithelial cancers.
The result was unambiguous: positive control pairs were strongly enriched in negative correlation values (mean Pearson coefficient –0.10 vs. –0.04 for all other pairs; Mann–Whitney U p = 3.17 × 10⁻²⁰; Figure 1). The framework recovers known drug–cancer relationships from first principles, using only transcriptomic data.
Figure 1: Distribution of Pearson correlation coefficients across all drug–cell line pairs, with positive controls (red stars) clustered in the left tail.
A closer look at RMC-6236 reinforces the signal. For the iCMS3 subtype, which is enriched for KRAS gain-of-function mutations, we observed greater cancer-state reversal in KRAS-mutant cell lines compared to wild-type lines, consistently across all three doses tested (Figure 2). The drug’s transcriptomic effect is genotype-specific, exactly as the biology predicts.
Figure 2: RMC-6236 reversal scores across CRC cell lines in iCMS3, stratified by KRAS mutation status and dose.
Mechanism of Action Reveals a Hierarchy of Reversal
When we aggregate reversal scores by drug mechanism of action (MOA), a coherent picture emerges (Figure 3).
MEK, RAF, JAK/STAT, and PI3K/AKT inhibitors show the strongest reversal of the cancer expression signature. This tracks directly with CRC genetics: gain-of-function mutations in KRAS, BRAF, and NRAS are among the most frequent drivers of colorectal cancer, and activating mutations in PIK3CA are similarly common. Several drugs within these MOA classes are already part of the standard of care or under active clinical investigation.
EGFR/ERBB inhibitors show more modest but above-average reversal. This is consistent with the fact that while EGFR itself is less frequently mutated or overexpressed in CRC, EGFR-targeted agents (cetuximab, panitumumab) are key components of approved combination regimens — their clinical benefit may be more context-dependent than the on-target biology alone would suggest.
Chemotherapies tell an interesting story. As a class, they are expected to shift cells to a different state rather than reverting them to normal — consistent with their mechanism as broadly cytotoxic agents. But within chemotherapies, DNA synthesis and repair inhibitors produce greater reversal than microtubule inhibitors. This distinction mirrors clinical practice: DNA-targeting agents (5-FU, oxaliplatin) form the backbone of CRC standard-of-care regimens like FOLFOX and FOLFOXIRI, while microtubule inhibitors do not.
JAK/STAT inhibitors also score highly, reflecting the frequent activation of IL-6/JAK/STAT3 signaling in colorectal tumors. Clinical translation of JAK/STAT inhibition in CRC has been disappointing so far, but this MOA remains under active investigation, and the transcriptomic data suggests the biological rationale is sound.
Figure 3: Reversal scores aggregated by mechanism of action, with targeted therapies in green, chemotherapies in yellow, and unclear MOA in gray.
The Top 20: From Expected Hits to Unexpected Insights
Zooming in on the top 20 drugs prioritized for the iCMS2 subtype (Figure 4, showing the dose with the strongest reversal), the list reads like a who’s-who of CRC-relevant pharmacology — with a few surprises.
The expected hits are well represented: MEK inhibitors (trametinib, cobimetinib, binimetinib), BRAF-V600E inhibitors (encorafenib, vemurafenib), the KRAS inhibitor RMC-6236, and the PI3K/mTOR inhibitor bimiralisib. Topoisomerase-II inhibitors idarubicin and mitoxantrone appear, as does the JAK/STAT inhibitor tofacitinib, the MET inhibitor capmatinib, and the nuclear export inhibitor selinexor. For the case of MEK inhibitors: while they have historically not been particularly successful in advanced or metastatic CRC, a recent clinical trial sponsored by Recursion showed promising results for FAP, a genetic disorder characterized by disseminated early-stage adenomas.
One compound that merits particular attention is elimusertib, nominally an ATR inhibitor. In a previous analysis of Tahoe 100M, we reported that elimusertib’s transcriptomic signature more closely resembles that of MEK inhibitors — a finding that is independently reinforced here by its ranking among the top reversal compounds alongside bona fide MEK inhibitors.
But perhaps the most intriguing signal comes from sodium salicylate. A recent placebo-controlled trial (Martling et al., New England Journal of Medicine, 2025) demonstrated that aspirin (acetylsalicylic acid) significantly increased 3-year disease-free survival (HR ≈ 0.55) among stage I–III CRC patients harboring PI3K/AKT pathway-activating mutations. In Tahoe 100M, however, salicylate and salicylic acid produce stronger cancer signature reversal than aspirin itself. This is a mechanistically meaningful distinction: aspirin’s additional acetyl group is what enables cyclooxygenase inhibition, so a stronger signal from the non-acetylated form suggests a COX-independent mechanism driving the transcriptomic reversal. The kind of observation that is only possible when you have the data density to compare closely related chemical structures across the same biological models.
Figure 4: Heatmap of top 20 drugs for iCMS2, showing Pearson correlation (reversal score) across nine CRC cell lines. Positive control pairs are highlighted in red.
What This Means — and Where It Goes
Across colorectal cancer models, cell-state reversal consistently prioritizes compounds that align with known biology: drugs targeting the oncogenic pathways most frequently activated in CRC rise to the top, while broadly cytotoxic agents that shift rather than revert the cancer state rank lower. The framework doesn’t just recover textbook pharmacology — it draws distinctions within drug classes (DNA synthesis inhibitors vs. microtubule inhibitors, salicylate vs. aspirin) that are consistent with clinical evidence.
This is what becomes possible when you have the data density of Tahoe 100M: 100 million cells, 60,000 experiments, hundreds of compounds profiled across genetically characterized cancer models. Cell-state reversal is not a replacement for viability screens — it’s a complementary lens that captures dimensions of drug activity that viability alone cannot see. As large-scale perturbational transcriptomic resources continue to expand, this approach may become a foundational tool for how we discover, repurpose, and personalize the next generation of therapies. And the exciting applications will for sure go beyond cancer, in disease areas where killing the cell is not the objective; but rather, rewiring it back to normal is what we are aiming for.







It seems like you guys criticize viability screens for lacking "immune-mediated killing, stromal interactions, and metabolic context." However, doesn't your data and model still rely on cell lines in a dish? Even if you measure the transcriptome, a cell line in a plastic well still lacks the immune system and stroma right?
how well does this do compared to CMap which has long used differential expression to match drug signatures to disease signatures?