Large Language Models (LLMs) have revolutionized Artificial intelligence (AI). Despite being originally designed for text, these models turn out to be incredibly effective for biological sequences like DNA and proteins. We were among the first to develop protein language models (pLMs) as global predictors of protein structure and protein function. We also showed that protein and DNA language models can accurately distinguish between disease-causing and benign mutations. The Brandes Lab continues to work on challenges that would unlock the full potential of genomic AI to prediagnose and treat disease and understand our genomes.

Join Us!

If you are excited about this research agenda, come work with us.

1. Predict diverse mutation effects in coding & noncoding regions

We seek to further improve mutation effect predictions with protein and DNA language models. Existing variant effect prediction algorithms try to predict whether a given variant is damaging or neutral, but variant effect is not a one-dimensional phenomenon. For example, different mutations in the same gene may lead to loss-of-function, gain-of-function or dominant-negative effects. We use modern AI, which is inherently high-dimensional, to tease apart these different effects. We also use data from high-throughput experiments such as deep mutational scans and single-cell RNA-sequencing to refine and improve our predictions over the mutational landscape.

2. Genetic engineering

We leverage our improved models of genetic effects to search for mutation combinations that optimize the genetic background of cells, for example to make immune cells more potent against tumors in cancer immunotherapy.

3. Improve genomic foundation models

A key bottleneck in genomic AI is insufficiently challenging benchmarks that mislead model developers. We see human genetics as providing uniquely challenging problems that only models with true knowledge about the underlying biology can tackle. Equipped with improved benchmarks, we seek to train better genomic foundation models that can address these important tasks. We also train phylogenetic-aware models that would allow foundation models to scale better.

4. Incorporate variant effect prediction in statistical genetics to implicate rare variants and establish causality

Genome-wide association studies (GWAS) and polygenic risk scores (PRS) are purely statistical: they search for genetic variants correlated with disease status without knowing anything about the molecular effects of these variants. In contrast, we leverage variant effect predictions, especially those made by frontier AI models, to guide GWAS and PRS towards variants more likely to have an effect. These functional priors is especially important in the presence of limited evidence (rare mutations) and when attempting to distinguish between causal and non-causal associations. We test our methods on large-scale genetic cohorts such as the UK Biobank.

5. Clinical implementation of genomic AI in genetic testing

We seek to modify the existing clinical guidelines to make better use of advanced genomic AI, for example when diagnosing rare genetic diseases. To do that, we evaluate the capacity of these models to provide better clinical evidence compared to existing protocols.

* * * * *

To learn more about the philosophy behind this research agenda, you can read this blog post.