Skip to main content


Future Opportunities for Genome Sequencing and Beyond:
A Planning Workshop for the National Human Genome Research Institute

Summary of Challenge Talks

Peptide Antigen Display and Recognition: a New Fusion of Genomics and Proteomics? - David Haussler

  • Large-scale measurement and predictive modeling of the human cell-mediated immune response will have applications in medicine, including cancer, infectious diseases, autoimmune diseases, aging, and tissue and stem cell transplantation.
  • One concrete challenge is to create a master reference database and computational model of T-cell antigen recognition events in humans. This will require large-scale sequencing, and also the development of large-scale experimental methods to determine displayed peptide antigens coupled with T-cell receptors (TCRs) that recognize them.
  • Specifically, the proposal involves:
    • Calibrating and refining genome, RNA, and TCR repertoire sequencing methodologies along with mass spectroscopy methods to separately determine HLA type, activity of the peptide antigen display pathway, and repertoire of MHC Class I and II displayed peptides on antigen presenting cells (APCs) along with repertoire of receptor CDR3 regions (both alpha and beta) on antigen recognizing T-cells.
    • Initiating a challenge to the community to do the following:
      • develop new technology to collect simultaneous data on millions of MHC Class I and II recognition events (four items per event: displayed peptide chain; HLA allele that displays the peptide; CDR3 amino acid chain from the TCR beta chain that recognizes displayed peptide; corresponding amino acid chain from TCR alpha chain);
      • catalog recognition events and create stochastic "imputation" models that predict which recognition events will occur given genome, RNA-seq and TCR repertoire data;
      • develop scalable methods for inhibition, enhancement, and creation of T-cells with specific receptors, both for research and for therapeutic purposes.
    • Investigating how to extend this to the much harder problem of B-cell/antibody recognition.
  • This project has potential for enabling more precise control of human immune response, with broad applications for human health.

Using the Negative Correlation between Polygenic Load and Rare Variant Burden (Nancy's 100M Project) - Nancy Cox

  • Existing high throughput genotype data can be used to calculate an individual's polygenic load of common variants. Plotting polygenic load versus rare variant burden demonstrates an inverse axis of risk for many common diseases.
  • This inverse axis of risk can be used to inform efficient study designs, and to develop more powerful ways to analyze and interpret sequencing data.
  • Specifically, the proposal involves:
    • Amassing new and existing GWAS and exome chip genotyping on 2-3 million samples, and characterizing individual polygenic load for multiple common diseases.
    • Sequencing affected individuals with low polygenic load and unaffected individuals with high polygenic load for each disease.
    • Building use cases to address clinical utility and regulatory/reimbursement issues
  • This approach allows us to leverage the extensive research to date on common variants to more efficiently study rare variants.

The Human Cell Atlas - Aviv Regev

  • Core technologies, sample preparation techniques and computational methods have advanced to a point that allows for high-volume single cell analysis.
  • The Human Cell Atlas would be a public resource that catalogs cell types, states, locations, transitions and lineages of human cells, and should be complemented with similar work in model organisms.
  • Specifically, the proposal involves:
    • Developing a pilot project in a small number of complementary systems (e.g. blood, lung, liver).
    • Driving costs down, to do characterization at a cost of 15 cents per cell.
    • Forming a large-scale consortium with standard, controlled processes and shared analytical tools.
    • Profiling 500 million cells, selected to be representative of, and allow good resolution of, the major cell types and states in the human body.
  • This atlas will provide foundational knowledge that could be used to test the function of genetic variants, improve understanding of disease heterogeneity and ultimately pave the way for characterization of individual patients.

Measuring the Functional Consequences of Very Large Numbers of Human Genetic Variants - Jay Shendure

  • The interpretation of genetic variation is a mission-critical challenge. 
  • Rich databases of experimental measures can to serve as training data for better computational algorithms.
  • Specifically, the proposal involves:
    • Experimentally measuring the functional consequences of 10 million variants.
    • Utilizing available or maturing technologies to generate variation in both coding and non-coding regions: including multiplexing to study all possible point mutations in an enhancer and all possible codon swaps in a protein. 
    • Performing, and cataloging, results from functional assays (involves technology and assay development)
  • This resource will also provide foundational information to improve our understanding of the biology of genomes.

Not Your Father's PDF: New Forms of Knowledge Representation for Genomic Sequence - Daniel Masys

  • Currently, genomic data is summarized in written reports that are attached to the medical record. 
  • Future studies should address the following limitations: only a small number of clinically relevant findings are reported, and reports discard the rest of the generated sequence data; the document reporting format is not amenable to parsing for automated machine interpretation; and the report is static, but science is changing rapidly.
  • The goal is to create a closed-loop, public computing infrastructure that learns and improves over time, for guiding clinical care based on molecular variation.
  • Specifically, the proposal involves:
    • Creating an information commons with a set of clinical decision support packages.
    • Designing the system so that it learns and improves whether or not the clinicians take the suggested advice (include automated tracking of events, outcomes and user decisions).
    • Including system-generated alerts that provide learning opportunities for the physician at the time of diagnostic testing and therapy decisions.
  • This work is resource generating and will advance technology and approaches throughout medicine.  Genomics should drive innovation in this area, since the complexity and volume of information in genomic-enabled healthcare exceeds human cognitive capacity for decision making.

Top of page