Projects

Evaluating the utility of cis-regulatory element graphs for modeling gene regulation

Cis-regulatory elements (CREs) play a key role in transcriptional regulation. While the growing availability of genomic data has facilitated the annotation of CREs, the rules and mechanisms which govern how CREs regulate genes is still not fully understood. Here we propose a new graph-based framework for modeling CRE-gene interactions, which will allow for improved understanding of gene regulation and the creation of new resources for the computational and machine learning community.


Multi-Omics DACC: The Data Analysis and Coordination Center for the collaborative multi-omics for health and disease initiative

The NIH is establishing a new Multi-Omics for Health and Disease Consortium to apply multi-omic technologies to study health and disease in ancestrally diverse populations. This project aims to establish a data analysis and coordination center to coordinate and support the Consortium’s activities and maximize its success. The center will manage consortium data, coordinate and contribute to protocol development and data analysis, create a multi-dimensional dataset and a data portal, and provide outreach to disseminate consortium results.


A comprehensive genomic community resource of transcriptional regulation

The human genome contains approximately 20,000 genes; in order to function properly, individual cells in the human body must precisely regulate the rate at which they use each of these genes through processes called transcriptional regulation. Segments of non-coding DNA outside of genes called regulatory elements mediate these processes, and differences in DNA sequence within regulatory elements can predispose individuals to developing particular traits or diseases. This project aims to build a comprehensive resource of regulatory elements in the human genome, including deep-learning models of their regulatory syntax, which will help researchers to understand how these regulatory elements function and to better design follow-up research to understand and treat disease

igSCREEN: An integrative data and annotation platform of gene regulation for immune-mediated disease research

Our genetic background underpins our responses to a wide variety of diseases including infection, autoimmune diseases, and cancer; machine learning and deep-sequencing based genomics technologies are revolutionizing our understanding of immunobiology and contribute to clinical predictions and recommendations. We propose to draw on the wealth of genetic and epigenetic data and our expertise in large-scale computational genomics to build a data integration and visualization platform for supporting a wide range of immune-mediated disease research.

Expanding transcriptional regulation resources to aid in the prioritization and interpretation of non-coding disease variants

We aim to enhance our existing resources for understanding transcriptional regulation (SCREEN & FactorBook), to prioritize and annotate non-coding disease variants implicated in Mendelian diseases. We propose to create ARGO (Aggregate Rank Generator), a novel web applet for prioritizing non-coding variants. ARGO will rank variants using a combination of sequence, element-level annotations, and gene properties, incorporating rank aggregation methods for a robust analysis. It will guide users in selecting annotations and thresholds with curated sets of functionally validated variants. Additionally, we propose to integrate large language models (LLMs) to enable natural language querying, enhancing accessibility and usability of our resources. Our approach aims to provide dynamic, user-tailored variant prioritization and make complex genomic data more approachable for a wider range of researchers, including those with clinical focuses.