The first draft of the human "pan-genome" has been released. What is its significance?

The first draft of the human "pan-genome" has been released. What is its significance?

Did you know that it has been 20 years since the first draft of the human genome sequence was released? In the human genome sequence published 20 years ago, most of the data came from a white volunteer and a mixture of the genomes of at least three other people, of which this white person accounted for more than 70% of the data.

As for why this happened, it was probably because sequencing a person's genome was too expensive at the time. According to the final statistics, scientists from six countries spent a total of $4.2 billion to complete the first draft of the human genome. Scientists at the time may have hoped to obtain as much human genetic information as possible through a genome project, so they mixed the DNA of different people; at the same time, due to technical constraints, the production of physical maps and other technologies at the time required a large amount of DNA. If all of it came from one person, I'm afraid a lot of blood would have to be drawn. Although it came from at least four people, this genome map appeared in the form of "one person."

In the past 20 years, tens of thousands of studies on human genes have been based on the genome sequence completed by the Human Genome Project. However, this reference genome still has many problems. For example, the genome technology was backward at the time, and the genome we saw was not complete for each chromosome, but had many "holes", especially in the parts with many repetitive sequences, such as telomeres and centromeres. When it was published in 2003, the genome was actually only 92% complete, and it took scientists another 20 years to complete the remaining part; second, although the genomes of different human individuals are on average more than 99.6% identical, the 0.4% difference has caused the diversity of humans, such as our hair color, height, skin color, etc. are all different, which is determined by that 0.4%. However, these characteristics cannot be fully described in the sequence map completed by the Human Genome Project because it only represents the genes of "one person".

In the past 20 years, with the advancement of technology and the continuous efforts of scientists, in 2022 scientists published the completed map of the human genome project from telomere to telomere, filling almost all the "holes" left by the human genome project. We really saw the complete "one" human genome map. The regret that the human genome has many "holes" mentioned above was completed. And in early May 2023, four articles published in Nature and Nature Biotechnology pushed the human genome into the "pan-genome" era, that is, the era of everyone's genetic characteristics. Today, let's talk about the latest series of progress.

First of all, what is a pan-genome? Pan-genome refers to the sum of all genomic information within a species, which covers more genetic diversity than a single reference genome. The most complete pan-genome is the sum of the genes of all individuals in this species.

Several articles published in journals such as Nature and Nature Biotechnology include: "Draft of the human pan-genome reference sequence" published in Nature Biotechnology; "Increased mutations and gene conversion in human fragment amplifications", "Recombination between acrocentric centromeres of human heterologous chromosomes", and "Using the Minigraph-Cactus alignment tool to construct a human pan-genome map" in Nature magazine.

Let us summarize the research results of these four studies: First, this pan-genome draft was obtained by analyzing independent and complete personal genome data from 47 independent individuals from different sources. Compared with the currently widely used human genome reference sequence GRCh38 version, the draft added 119 million base pairs (referring to two complementary paired bases in the DNA double helix structure) and 1,115 gene duplications.

The picture comes from Tuchong.com

Compared with GRCh38, this draft can detect 104% more genes with structural variations. It makes up for the 210 Mb (megabase) DNA sequence fragment in GRCh38, of which 151 Mb was completely unknown before, and 59 Mb was a predicted sequence obtained by previous computer simulation. This missing situation will cause data bias in related studies, which also means that there are still many areas in the human gene map that we don't know about. It still needs to be improved.

Second, scientists developed a single nucleotide variant (SNV) map that contains millions of previously uncharacterized SNVs, and a new pan-genome map describes the variable nature of some genomic regions that have segmental repeat sequences and share highly identical DNA sequences repeated at one or more sites in the genome. The presence of such repeat sequences may lead to genomic variation, which in turn affects an individual's phenotypic traits and risk of disease.

Third, using the human pan-genome draft, scientists observed the recombination pattern between the short arms of heterologous centromere chromosomes and a certain DNA exchange mechanism, which indicates that there is a DNA exchange method between chromosomes that was previously speculated but not confirmed.

Fourth, researchers used the human pan-genome draft to improve the accuracy of the pan-genome reference genome. In this study, scientists demonstrated the process of "Minigraph-Cactus pan-genome analysis", which can create a pan-genome directly from whole genome alignments, and it can also process cross-species genome data between humans and fruit flies. This provides more comprehensive information for a better understanding of genomic variation between species and individuals in the future.

Of course, the research results achieved this time are only a transitional stage in the development of human pan-genome research. The entire plan aims to observe and describe the genetic diversity of 350 individuals. What has been completed now is only a small part of it. The researchers plan to complete the genome sequencing of these 350 people by mid-2024.

Finally, from the Human Genome Project to the current Pan-Genome Project, we have seen the contribution of the Chinese, and the proportion of contribution has gradually increased since BGI represented China in the Human Genome Project with 1%. This time, we saw that among the four articles, the corresponding author of two of them is Dr. Li Heng from China, who is also a great figure in the field of genome research. There are also many Chinese names on the list of authors.

We also hope that in the future, there will be more Chinese voices and more contributions from China in the field of human genome research.

This article is a work supported by Science Popularization China Starry Sky Project

Author: Tiangeng

Reviewer: Tao Ning (Associate Researcher, Institute of Biophysics, Chinese Academy of Sciences)

Produced by: China Association for Science and Technology Department of Science Popularization

Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd.

<<:  It is said online that "the Forbidden City has lasted for 600 years and has never been flooded." Did the ancients really have advanced drainage technology?

>>:  Failure + delay, where will Europe's launch vehicle go?

Recommend

The efficacy and function of porcupine meat [picture]

In fact, the occurrence of many human diseases is...

What are the effects of Platycladus orientalis seeds

The medicinal value of Platycladus orientalis see...

Can Cordyceps sinensis be eaten if it is moldy?

Cordyceps sinensis is not only a great gift, but ...

The efficacy, effects and contraindications of Senecio

Summer is coming, we all know that high temperatu...

The world's largest creature has been discovered: 180 kilometers long!

Armillaria ostreatus, once the world's larges...

Why are children always the ones who get hurt when dogs are not on leashes?

Not long ago, a very infuriating and distressing ...

Medicinal effects of Chinese herbal medicine Polygonatum sibiricum

Morinda officinalis is produced in Heilongjiang P...

The efficacy and function of autumn peony root

Autumn peony root is a very common Chinese medici...

How was the world's highest meteorological station built?

|||| At about 12:46 on May 4, an automatic weathe...

What are the harms of taking Chinese medicine for a long time to the body

Some people may develop medical problems, and lon...

How to prevent fumonisin poisoning in summer?

How to prevent fumonisin poisoning in summer? Zha...