The Science Of Health: Welcome back to “The Science Of Health”, ABP Live’s weekly health column. Last week, we discussed different types of cancers in women. This week, we will discuss how the new human reference genome, which represents more diversity, will help understand the link between genes and health.
In 2003, the human genome was sequenced for the first time. Known as the human reference genome, it has been used as a standard to understand diseases, evolution, the role of genes, and to compare other genetic data.
However, the human reference genome has not been without flaws. Although it was sequenced from about 20 individuals, about 70 per cent of its data was obtained after sequencing the DNA of a single man who was of predominantly African-European background.
Now researchers have developed a reference pangenome using the genomes of 47 individuals from across the world. This reference genome, called a “pangenome”, represents the highest diversity ever known. This will give researchers a broader understanding of health and illness when they compare the genome of any individual with the reference pangenome.
The study describing the findings was recently published in a set of papers in Nature and other journals.
The existing reference genome
The existing reference genome is based on DNA sequenced as part of the Human Genome Project, which decoded about 92 per cent of the human genome. The remaining eight per cent of the human genome, which contains DNA from a region called heterochromatin, was sequenced in 2022. Heterochromatin is gene-poor, highly condensed, and transcriptionally silent, which means it does not code for proteins.
ALSO READ | Scientists Decode How The Entire Human Genome Was Sequenced. Know Why It's Important
Even after the DNA of the heterochromatin region of the human genome was sequenced, the 0.2 to 1 per cent of DNA that represents diversity remained imperfect.
The earlier human reference genome does not tell us anything about the 0.2 to one per cent of genetic sequence that makes each human on Earth different from others, because 70 per cent of the genome's data comes from a single person. The reference genome does not represent several genetic variants found in non-European populations. Some of the health problems in the world are attributed to this bias in biomedical data.
The new reference pangenome
Researchers at the Human Pangenome Reference Consortium, a collaboration launched in 2019 to address the problem of the original human reference genome lacking diversity, have decoded 99 per cent of each sequence in the pangenome with high accuracy, and successfully characterised the 0.2 to one per cent of the genome that represents diversity.
The researchers have discovered about 120 million DNA base pairs after sequencing the genomes of the 47 individuals.
A statement released by Rockefeller University said Erich D Jarvis, one of the primary investigators involved in the research, analysing the Vertebrate Genomes Project, which aims to sequence all 70,000 vertebrates, decided, along with collaborating labs, to use the advanced sequencing and computational techniques, to obtain high-quality diploid genome sequences and unveil the variations within a single vertebrate: human (Homo sapiens).
The scientists collected a diversity of samples from the 1000 Genomes Project, which is a public database of sequenced human genomes from more than 2,500 individuals. These humans represent 26 populations who differ from each other geographically and ethnically. The majority of the samples collected by the researchers belong to humans from Africa.
How it helps health research
The statement quoted Jarvis as saying that the complex genomic collection represents significantly more accurate human genetic diversity than has ever been captured before. Since researchers now have a greater breadth and depth of genetic data at their disposal, and a greater quality of genome assemblies, Jarvis was quoted as saying, they can improve their understanding of the link between genes and disease traits. This will help accelerate clinical research.
We know that all humans have two copies of every chromosome, because every person inherits one genome from each parent. The analysis of 47 people yielded 94 distinct genome sequences, two for each set of chromosomes.
The researchers then used advanced computational techniques to analyse the 94 sequences. As mentioned earlier, they identified 120 million DNA base pairs that were previously unseen (or were in a different location in the previous reference). Of these, 90 million derive from structural variations (these are differences in DNA caused by chromosomes being moved, deleted, or duplicated).
Jarvis noted this as an important discovery, because structural variants are known to play a major role in health, as well as in population-specific diversity. “With so many new ones identified, there's going to be a lot of new discoveries that weren’t possible before,” he was quoted as saying.
Other key takeaways
One of the important implications from the effort relates to the major histocompatibility complex (MHC). This refers to a cluster of genes that code proteins to help the immune system recognise antigens, such as those from the SARS-CoV-2 virus.
Using the older sequencing methods, it was difficult to study MHC diversity. Now, Jarvis noted that the new pangenome shows much greater diversity than expected. “This new information will help us understand how immune responses against specific pathogens vary among people,” he was quoted as saying.
It could also lead to better methods to match organ transplant donors with patients, or identify people at risk for developing autoimmune disease.