Skip to main content UVA Health logo of UVA Health
Newsroom
March 27, 2025

New Genomics Tool Accelerates Biomedical Breakthroughs

DNA strands representing genomics research

A School of Medicine scientist and collaborators have developed a much-needed new tool to increase the efficiency of genomic research and accelerate the development of new ways to improve human health.

UVA researcher Nathan Sheffield, PhD, has spent four years developing a new data standard to ensure that scientists are comparing apples to apples when doing genomic analysis. This type of analysis helps researchers understand the operating instructions for our cells and see how those instructions are carried out. The resulting insights help us understand the workings of both healthy cells and unhealthy ones, pointing us to new ways to treat and prevent disease.

Genomics is a complex field that involves analyzing vast amounts of data. This work is complicated by the number of researchers involved and the varying ways they have named “reference sequences” over the years. Reference sequences are essential tools in genomic research, often representing genetic information compiled from multiple individuals. Researchers rely on these sequences to identify gene variations driving genetic diseases and to understand how diseased cells behave differently than normal cells.

Sheffield’s new standard, called refget Sequence Collections, will streamline genomics research by letting scientists more quickly and efficiently identify reference sequences. This will help ensure that the results of a genomic analysis are right and repeatable. This, in turn, can accelerate medical breakthroughs and improve our understanding of the clinical relevance of genetic variation.

“Imagine a class where each student had a different version of the book. Maybe the words are slightly different, the page numbers don’t match, the chapters titles and numbers aren’t the same and the study questions are in a different order. Those differences in the reference text would make it hard for the students to communicate with each other about what they’re learning, even if the general ideas behind the reference are basically the same,” Sheffield said. “If the students could identify each version of the text exactly, and also get detailed comparisons showing how they differ, that would make it much easier to communicate ideas and compare results. In the same way, refget Sequence Collections can tame the chaos of slightly different references, improving collaboration, sharing and reproducibility of research results based on genomic data.”

New Genomics Tool

For scientists, trying to identify the exact reference sequence used for published results can be a major burden. It’s time-consuming and involves guesswork – the type of toil you might assume could be done automatically but often is not. Sheffield’s new tool addresses that problem, helping scientists eliminate drudge work while ensuring they are comparing their data to the same references.

The tool serves as an important addition to the more than 40 genomic-research resources developed by members of the Global Alliance for Genomics and Health (GA4GH). GA4GH is a not-for-profit that sets standards and develops policies to expand genomic data use within a human-rights framework.

GA4GH previously developed refget sequences to simplify reference-sequence identification by assigning unique identifiers to single genomic sequences. Sheffield’s new tool takes the next step, assigning names to groups of reference sequences, such as all the DNA sequences that correspond to a whole reference genome.

This will bring much-needed organization to genomic research while also addressing long-standing challenges that have slowed scientific breakthroughs, Sheffield says. Now automation will free scientists from the important but tedious grind of hunting up reference sequences, allowing them to focus their attention on advancing discoveries that will benefit human health.

“I hope this standard helps solve some of the difficulty the scientific community has faced integrating genomic and epigenomic data,” Sheffield said. “With a standardized, approved way to refer to references, we can accelerate the understanding we gain from integrating results across many experiments.”

About the Research

Work on the new tool was an international collaboration led by Sheffield; Timothé Cezard at EMBL’s European Bioinformatics Institute; Andy Yates at EMBL’s European Bioinformatics Institute; Sveinung Gundersen at ELIXIR Norway; Shakuntala Baichoo at Peter Munk Cardiac Centre-Artificial Intelligence; and Rob Davies at Wellcome Sanger Institute, with support from LSG Work Stream Manager Reggan Thomas at EMBL’s European Bioinformatics Institute and Work Stream Co-Leads Oliver Hofmann at the University of Melbourne and Geraldine Van der Auwera at Seqera.

Sheffield holds appointments in the School of Medicine’s Departments of Genome Sciences and Biochemistry and Molecular Genetics, as well as in UVA’s School of Data Science and in UVA’s Department of Biomedical Engineering, a joint program of the School of Medicine and the School of Engineering and Applied Science.

To keep up with the latest medical research news from UVA, subscribe to the Making of Medicine blog.

Categories: All Releases

Media Contact
Joshua Barney

Deputy Public Information Officer

Email  |  jdb9a@virginia.edu

Phone  |  434.906.8864

Latest News