NHGRI logo

Why was it so difficult to fully complete the human genome sequence?

The Human Genome Project ended in 2003, but genomic researchers had not yet determined every last base (or letter) of the human genome sequence. Instead, they had only completed about 92% of the sequence at that time. Why did they stop there?

Telomere-to-telomere infographic

This text is hidden for screen readers only.

Woman looking at an incomplete human genome sequence, with A, C, T and Gs, but missing some.

She has doubts and says "Hmm, this doesn't look complete to me."

Reason 1

The human genome contains a massive amount of DNA.

The human genome consists of about 3 billion bases in a precise order, each of which can be represented by a letter (G, A, T or C). A genome's sequence cannot be read out end-to-end. Rather, researchers must first determine the sequence of random pieces of DNA and then use those smaller sequences to put the whole genome sequence back together like a massive puzzle.

Telomere-to-telomere infographic - Reason 1

This text is hidden for screen readers only.

Woman looking at a map of the United States of America that shows the distance from Houston, Texas to Boston Massachusetts (1,604 miles).

If you printed out ~3 billion letters of the human genome in size 12 font, it would stretch from Boston to Houston. Note: Only 20 letters of DNA sequence shown.

To imagine what ~ 3 billion letters of the human genome sequence would look like, multiply this by 153 million! Road trip anyone?

Reason 2

Some parts of our DNA are painfully repetitive.

Some sections of the human genome sequence consist of long, repetitive stretches of letters that are difficult to put in the right place. Over the past two decades, researchers developed new technologies to read longer stretches of DNA - from only about 500 to now over 100,000 letters at a time - which allowed them to assemble the full length of the most difficult repeats.

 

Telomere-to-telomere infographic - Reason 2

This text is hidden for screen readers only.

Lady on the left holding a short read sequence of a chromosome (Chromosome 1) asking "Where does this go?"

Lady on the right holding a long read sequence of a chromosome (Chromosome 2) saying "Mine matches with #1!"

Reason 3

The first 92% was hard. The last 8% was excruciating.

Those DNA repeats and other obstacles stood between the genomic researchers and the final 8% of the human genome sequence until new laboratory and computational technologies were developed. It took almost twice as long to finish the last 8% of the human genome as it did the first 92%!

Telomere-to-telomere infographic - Reason 3

This text is hidden for screen readers only.

Graph showing the percent of human genome sequence released, with steep increase from late 1990s through early 2000s. X axis represents years: 1990 to 2000 to 2010 to 2020. Y axis represents percentage from 0% to 100%.

Late 1990s, approximately 10% of the human genome sequence was released.

Early 2000s through 2010s, 92% of the human genome sequence was released.

Late 2010s through 2020, the remaining 8% of the human genome sequence was released.

To the left of the graph, a scientist celebrates with her hands in the air, saying "Phew, we did it!"

Reason 4

The last 8% needed a generation of dedicated genomic researchers with a vision.

Even with new technologies, genome sequencing is still tough, time-consuming work that requires a lot of skill and dedication. The current generation of genomic researchers are true perfectionists and brought everything together to finally complete the human genome sequence.

Telomere-to-telomere infographic - Reason 4

This text is hidden for screen readers only.

Jigsaw puzzle consisting of 4 pieces with the words (clockwise, from left to right): Complexity, Technology, Cost and Patience.

Messages accompany each puzzle piece:

A lady sits on the Complexity puzzle piece working on her laptop with the message "Be strong, little computer!"

A gentleman attempts to join Technology with Complexity (to the left) and Cost (below), with the message "These new methods are so powerful!"

Another gentleman attempts to join the Cost puzzle piece with Patience (to the left), with the message "It's much cheaper to sequence DNA now!"

Another lady holds the Patience puzzle piece in place, with a message "This is tedious!"

Last updated: August 10, 2021