A couple months back, we reported on a study showing that genetic tests for an inherited heart disorder were more likely to come back with false positive results for black Americans than for whites. The study provoked many in our industry to urge scientists to incorporate more ethnic diversity in their studies. So far, biology has been too Eurocentric—the databases are implicitly racist, they argue.
Perhaps no dataset for human genomics is referenced more than the human reference genome, or the GRCh38. This is the "Rosetta Stone” of genomics used by scientists and clinicians everywhere who are assembling and studying genomes. Valerie Schneider is a scientist at the NCBI who works everyday on the GRCh38. She says major strides--enabled in part by better sequencing technologies--have been made lately to add diversity to the GRCh38 and to create other reference genomes for various populations around the globe.
The populations represented with these new projects include a Han genome, a Puerto Rican, a Yoruban, a Columbian, a Gambian, a Luhya, a Vietnamese, and one or two more Europeans.
“The sequence from these genomes is planned for correcting errors and adding new "alt loci" to the reference genome. But these new assemblies are also intended to stand on their own as complements to the reference,” says Valerie.
Valerie reminds us that it’s still early days in genomics. There’s so much diversity in the human population that her team is not sure whether having a single reference for each of these ethnic groups will be sufficient.
With more reference genomes comes the challenge of how best to compare and visualize them. There is a major need for tools that can show large nests of sequence as opposed to a linear reference, she says in today’s interview.
What is Valerie's take on the term “reference quality genomes”, and how will a better reference genome improve precision medicine?