Skip to main content

Thousands of missing variants in the UK Biobank are recoverable by genome realignment.

Citation
Jia, T., et al. “Thousands Of Missing Variants In The Uk Biobank Are Recoverable By Genome Realignment.”. Annals Of Human Genetics, pp. 214-220.
Center UCSD-UCLA
Author Tongqiu Jia, Brenton Munson, Hana Lango Allen, Trey Ideker, Amit R Majithia
Keywords DNA, exome, Genetics, sequence alignment, sequence analysis
Abstract

The UK Biobank is an unprecedented resource for human disease research. In March 2019, 49,997 exomes were made publicly available to investigators. Here we note that thousands of variant calls are unexpectedly absent from this dataset, with 641 genes showing zero variation. We show that the reason for this was an erroneous read alignment to the GRCh38 reference. The missing variants can be recovered by modifying read alignment parameters to correctly handle the expanded set of contigs available in the human genome reference. Given the size and complexity of such population scale datasets, we propose a simple heuristic that can uncover systematic errors using summary data accessible to most investigators.

Year of Publication
2020
Journal
Annals of human genetics
Volume
84
Issue
3
Number of Pages
214-220
Date Published
12/2020
ISSN Number
1469-1809
DOI
10.1111/ahg.12383
Alternate Journal
Ann Hum Genet
PMID
32232836
PMCID
PMC7402360
Download citation