The last few years have seen a dramatic increase in the number of publicly available complete genome sequences and annotations. At the same time, advances in technology have allowed individual researchers to perform experiments that generate tens of thousands of data points. This massive increase in data poses challenges for the individual biologist, requiring large-scale data analysis capabilities that are best handled using computational approaches. Dr. Wolfsberg's research focuses on developing methodologies to integrate sequence, annotation, and experimentally-generated data to assist bench biologists in quickly and easily analyzing results from their large-scale experiments.
Several recent projects have required that short sequences be mapped back to the genome or transcriptome from which they were derived. As neither existing heuristics nor simple pattern-matching approaches are well-suited for the task, Dr. Wolfsberg's group has developed a suite of algorithms to rapidly align sequences under 25 nucleotides in length. One of these programs is designed to map tens of thousands of sequence tags to whole genomes in only a few minutes, allowing for mismatches. A faster version has been developed for use when the sequence tags start or end with a common pattern, such as a specific restriction enzyme site. A third program is optimized to search for a single degenerate sequence, such as a consensus transcription factor-binding site, in a complete genome.
A related research effort has been to determine the genomic context of a set of coordinates, such as those obtained using one of the alignment algorithms described above. Graphical genome browsers themselves cannot be practically used for analyzing large sets of coordinates. Thus, Dr. Wolfsberg's group has developed algorithms that compare the positions of interest to the coordinates of features displayed in a genome browser, such as genes or conserved sequences. Based on a set of genomic regions as input, the programs identify either overlapping or the closest genomic annotation. For example, they can provide a list of coordinates that are upstream or downstream of genes, or highlight regions that are conserved across species. In order to evaluate the statistical significance of the computationally determined findings, Dr. Wolfsberg's laboratory has developed methods for extracting sequences at random that have the same biological characteristics as the sequences being analyzed in a given experiment. The genomic context of these control sequences is then determined, with the resulting information then used to establish p-values associated with the experimental data.
Dr. Wolfsberg's group has used these sequence mapping and annotation programs for a wide range of projects. For example, they have collaborated with researchers in locating transcriptionally active regions of DNA by finding sites that are sensitive to deoxyribonuclease (DNase), and in exploring gene expression patterns by identifying genome-wide consensus binding sequences for selected transcription factors. She is currently collaborating with researchers at the National Heart, Lung, and Blood Institute in assessing the efficacy and safety of retroviruses used as vectors in gene therapy studies.
More specifically, Dr. Wolfsberg's group is studying the positions at which retroviruses and retroviral vectors integrate into the host genome during retroviral gene therapy. Recent studies have shown that one of the common retroviruses used in gene therapy, the Moloney murine leukemia virus (MLV), can integrate into genes and disrupt their function. In a clinical trial of retroviral gene therapy, four patients with X-linked severe combined immunodeficiency developed leukemia after the MLV vector integrated near a proto-oncogene, thereby activating it. In an attempt to identify alternate vectors for retrovirus-mediated gene therapy, Dr. Wolfsberg's group has performed a systematic analysis of the integration patterns of avian sarcoma leukosis virus (ASLV) in the rhesus macaque, and has followed three macaques for more than four years following treatment with a vector based on simian immunodeficiency virus (SIV). These studies have shown that both ASLV and SIV appear to be safer alternatives to MLV for gene therapy. Thus, optimized vectors based on either of these viruses may be considered for future gene therapy trials.
Top of page
Last Reviewed: May 18, 2014