Research Training Needs in Statistical Genetics / Genetic Epidemiology Workshop Summary

National Human Genome Research Institute

National Institutes of Health
U.S. Department of Health and Human Services


Research Training Needs in Statistical Genetics / Genetic Epidemiology

Workshop Summary

Conveners: Jeremy M. Berg, Ph.D., National Institute of General Medical Sciences
AND
Francis S. Collins, M.D., Ph.D., National Human Genome Research Institute
Terrace Level Conference Room
5635 Fishers Lane, Rockville, MD

Wednesday, May 21, 2008, 10 A.M to 4 P.M.

Workshop Summary Workshop Agenda Workshop Participants

Background

The Directors of the National Institute of General Medical Sciences (NIGMS) and the National Human Genome Research Institute (NHGRI) hosted a workshop to address a concern of NIH Leadership Forum participants that there is not a sufficiently trained cadre of scientists to develop methods and analyze the vast amount of data generated from population genomics studies employing current and rapidly emerging technologies. A small group of leaders in the fields of statistical genetics and genetic epidemiology (from both the extramural and the intramural communities) were convened to discuss the issue.

Top of page

Session I: The Challenge

The meeting began with Jeremy Berg, Ph.D., briefly describing the problem with supply and demand for a sufficiently trained cadre of scientists to develop methods and analyze population genomics data of increasing size and complexity and the need for NIH to be responsive to this increasingly critical personnel problem. He stated that in 2004, NIGMS spearheaded a trans-NIH wide effort to increase the number of biostatisticians trained in the fundamentals of the discipline with the expectations that specializing in specific areas would occur at a later time in the trainee's career. This generic program (not disease or tissue-specific) was also meant to service the entire NIH. The Program Announcement [grants.nih.gov] was active for three years and was then folded into the standard NIH National Research Service Award (NRSA) institutional training T32 mechanism (PA-06-468) [grants1.nih.gov]. The participants were encouraged to discuss their ideas freely. Although the budget is uncertain, participants were encouraged not to let this constrain the exchange of ideas.

Francis Collins spoke to the many scientific opportunities for which new methods that will need to be developed and sufficient numbers of individuals who will need to be trained to do the analyses. In addition, emerging sequencing technologies, while sorely needed, will only increase this need. As an example of the opportunities / challenges was the observation that the pilot phase of 1000 Genomes resulted in 390 gigabytes of information being submitted to GenBank; this was equivalent to the entire content of GenBank at the time.1 Amongst the biological projects driving the data acquisition are: Genome-Wide Association Studies (GWAS); Genes, Environment, and Health Initiative (GEI); Genetic Association Information Network (GAIN) [fnih.org]; 1000 Genomes [1000genomes.org]; The Cancer Genome Atlas (TCGA) [cancergenome.nih.gov]; Human Microbiome Project [commonfund.nih.gov] and the Genotype-Tissue Expression Resource (GTEx), a proposed new NIH RoadMap project to get genotypes and expression data on 30 tissues from 1000 individuals. The technologies to generate sequencing data include the new sequencing machines developed by 454 Life Sciences, Applied Biosystems, and Illumina, Inc.

There were two presentations on supply and demand for training. The first presentation was by Alexander (Alec) Wilson, Ph.D., who provided data on the number of gene markers versus the number of members belonging to the International Genetic Epidemiology Society (IGES). Approximately two-thirds of the members are from the United States. He used IGES membership as a proxy for the number of statistical geneticists / genetic epidemiologists and made the assumption that while the number of gene markers is finite, this number does not represent projects. His analyses showed the following:

  1. The number of markers was stable around 100 from 1980 until around 2000.
    Between 2000 and 2005, the number of markers went up rapidly to approximately one million.

  2. The number of IGES members rose from about 100 in 1980 to about 150 in 1990. From 1990 to 1995, the number rose from 150 to about 350.
    From 1995 to 2005, the number rose to 500; a small decline was noted in 2006.
    It was noted that these number reflect the number of members attending the meeting for that year, but could still serve as a proxy for the membership.
    Comparing the number of members to the number of markers, the ratio of markers to members changed precipitously from ~1:1 in years 1980 to 2000 to ~ 1:2,000 in years 2000 to 2005.

There was some discussion about whether the IGES membership represents the entire community based on the fact that the availability of on-line journals negates the need for individuals to join a society in order to receive the journal and that there are some individuals who are intensively involved in the development of methodologies and analyses who do identify as statistical geneticists or genetic epidemiologists.

Two separate but complementary efforts were described to collect data on training needs in the area of statistical genetics and genetic epidemiology. The first was described by Alexander Wilson. He will be working with IGES to poll the community in a more systematic way to assess training needs, etc. He presented a draft of the type of data to be collected. During the lunch break, participants reviewed and pre-tested the form. Prior to the beginning of Session II, the participants discussed the idea and the data elements to be collected. The participants thought that this was a useful exercise and agreed to provide feedback to Alec by June 1. It was suggested that members of the Genetic Analysis Workshop and American Society of Human Genetics also be asked to complete the form.

The second presentation of training needs, given in a tag-team fashion by Bettie Graham, Ph.D., and Shawn Drew, Ph. D., profiled the NIH National Research Service Award (NRSA) institutional training grants funded in 2007. The data showed:

  1. There are 25 NIH training grants that support training in statistical genetics and genetic epidemiology.
  2. There were 44 predoctoral and 23 postdoctoral positions supported in the area of statistical genetics / genetic epidemiology.
    Oonly one pre doc position was unfilled. All post doc positions were filled.
  3. The retention rate on training grants for this discipline was 95 percent.
  4. There was no support for 372 foreign students / trainees and 107 US students/trainees wanting training in these areas.
  5. Most students / trainees were supported on research grants or training grants;
  6. 83 percent of students were employed immediately upon completion of their training, with about 75 percent of them going directly into academia.
    Aalthough two training programs reported that a high percentage of their students were employed in industry.
  7. Because this field requires a strong background in mathematics and statistics, most program directors thought that it would be easier to cross-train individuals who had solid quantitative skills, although they did acknowledge that there are some biologists who do have strong backgrounds in mathematics and statistics.
  8. Several challenges were identified, such as:
    • The need for more mentors
    • The need to establish relationships with faculty in complementary departments
    • Quantitative scientists who lack wet-lab experience
    • Lack of training opportunities for foreign students
  9. The eed for more programs at the undergraduate level to provide opportunities for students to participate in this type of research; need for graduate and medical schools to require stronger quantitative skills for admission.
  10. PIs need research grant support. There are specific problems with peer review for this growing field.
  11. NRSA programs need to be more flexible in the number of years needed for training for effective cross-disciplinary training.
  12. Foreign students should be supported.
  13. Faculty with joint appointments can be successful recruiters.

Top of page

Session II: Discussion

As a result of many resources becoming available to the scientific community, such as the reference sequence of the human genome, the catalogue of genetic similarities and differences in several populations, and the continuing decreasing cost of large-scale genome sequencing which will make it possible to rapidly sequence entire mammalian genomes inexpensively, scientists now have an enormous amount of data available to them. In the not too distant past, most data analyses were gene-by-gene; more and more, analyses are genome-wide. As a result new analytical methods are needed to organize and evaluate the data and more trained individuals are need to design appropriate applications. Initial review of the number of individuals being trained in statistical genetics and genetic epidemiology indicates that more trained scientists in these fields are needed. The participants considered this an urgent problem and made the following recommendations:

Undergraduate Level: The earlier students are exposed to research in these areas, the more likely they will choose to major in one of these fields in graduate school. Comments related to undergraduates included:

NIGMS, through its MARC-U*STAR Program Announcement [grants.nih.gov], provides opportunities for grantee institutions to develop distance learning courses and other curricular offerings (e.g., methods to integrate quantitative sciences to study biological phenomena) as a way to supplement course offerings.

Graduate Level: There was a general agreement that training in the quantitative sciences should be strengthened in all science graduate education programs and that individuals enrolled in statistical genetics and genetic epidemiology programs should be required to have a minimal set of core competencies determined by the community.

Post Doctoral Level: The discussion centered on enhancing the skills of postdoctoral fellows.

Career Paths: A portion of the meeting was set aside to discuss career paths. Some of the comments were:

A strong case was made for providing opportunities for masters level students. Properly trained, these individuals can make significant contributions to the research efforts and would free up time for principal investigators to pursue original work which would result in publications necessary for achieving and maintaining tenure-track status.

NOTE: The participants viewed the current urgency to train more statistical geneticists and genetic epidemiologists indicative primarily of a serious problem with the United States' system of education. Some of the issues noted were:

  1. The mathematics skills of U.S. primary and secondary school students need to be strengthened significantly. The lack of mathematical skills affects not only those interested in the quantitative sciences, but also those interested in the life sciences, since all of these fields are becoming inundated with volumes of data.

  2. Graduate schools and medical schools should require more courses in the quantitative sciences for admission and should require more quantitative courses in their curriculum. (Note: Jeremy Berg, Ph.D., pointed out that all NIGMS training grants require quantitative training, regardless of program area).

  3. The larger community needs to do a better job of communicating the excitement of this particular area of science to undergraduate students such as relating the science to popular television programs, such as CSI (Crime Scene Investigation); the visibility of the science to the general public can be increased by advertising on billboards, public transportation vehicles, etc.; and touting the fact that there are no unemployed individuals in these areas of science.

  4. Faculties in schools of public health and medical schools need to collaborate more.

  5. Technology is very important and is changing rapidly. In order to answer questions in a meaningful way, it will be necessary to have an ever increasing number of tools in one's armamentarium.

  6. The name of this discipline (statistical genetics and genetic epidemiology) has been split for many decades. An effort to find a name that describes the field and is easily interpretable by those outside the community might help improve the "branding."

  7. Since industry is also an attractive place of employment for individuals trained in statistical genetics and genetic epidemiology, it would be very helpful to develop a partnership to assist in training.

Action Items: The group will meet via a conference call (date and time to be determined) to discuss the report and follow-up actions:

Invited Experts:

  1. By June 1, provide feedback to the handout Alec Wilson provided during the working lunch session.
    Alec may be reached at afw@mail.nih.gov or (410) 550-7510.

  2. Discuss ways to effectively "Brand" this field.

  3. Work with scientists in the field to develop Core Competencies.
    What areas should NIH require as a set of core skills trainees in the field must know?

  4. Collect data on the review of applications in this field; the perception is that the study sections do not have the appropriate expertise.
    Most applications deal with methods development and use the disease as a test bed. Documentation is necessary in order to present a compelling case to the NIH..

  5. Discuss ways for faculty in schools of public health and agriculture to become involved in human genetics/genomics studies.
NIH staff:
  1. Send information to the group on NSF / NIGMS joint mathematical biology research program.

    Response: NSF / NIGMS joint mathematical biology research program [nsf.gov]. NIGMS, through a joint partnership with NSF, offers an initiative to support research in the area of mathematical biology. Both agencies recognize the need and urgency for additional research at the boundary between the mathematical sciences and the life sciences. This program is designed to encourage new collaborations at this interface, as well as to support existing ones.

  2. Consider providing additional training opportunities in statistical genetics/genetic epidemiology by targeting pre-doctoral fellowships (F31s) and supplements to train individuals in statistical genetics and genetic epidemiology. Also, consider developing a post baccalaureate program focused in this area.

  3. Determine if a grant program should be created to develop distance learning training courses in the field.

  4. Discuss ways to present findings from this workshop to the NIH Leadership.

  5. Encourage R01 support for research in statistical genetics / genetic epidemiology

Top of page

Footnotes

1 NOTE from Adam Felsenfeld. GenBank is an archive of assembled data or projects (even small ones, like sequences of individual genes). However, the primary product of 1000 Genomes is individual reads. These are deposited into the Short Read Archive, and as such are not assembled (there will eventually be derivative data from these reads, like assemblies which will go into the assembly archive, and SNPs which will go into dbSNP, etc.). So, while it is true that the amount of bases from three weeks of 1000 G production was more than double the bases in GenBank, it is probably better to compare with the amount of data in the Trace archive, which has been in existence for about five years and archives individual trace data from the 3730 platform. 1000 Genomes in its initial deposition was about 10 percentof the total amount in Trace. That is about five or six times the previous deposition rate, realized essentially instantaneously, and the rate will climb steeply.

Top of page

Last Updated: March 13, 2012