Systems and Methods for Ultra-Fast, Powerful, Efficient, and Accurate Detection of Segments Identical by Descent in Biobank-Scale Cohorts

Description:

With the availability of genotyping data of very large samples, there is an increasing need for tools that can efficiently identify genetic relationships among all individuals in the sample. One fundamental measure of genetic relationship of a pair of individuals is identity by descent (lBD), chromosomal segments that are shared among two individuals due to common ancestry. However, the efficient identification of lBD segments among a large number of genotyped individuals is a challenging computational problem. Some methods, such as GERMLINE, use fast dictionary lookup of short seed sequence matches to achieve a near-linear time efficiency. However, the number of short seed matches often scales up super-linearly in real population data.

Collaborators at the University of Texas Health Science Center at Houston and the University of Central Florida have developed a novel approach for lBD detection named RaPID. Taking advantage of an efficient population genotype index, Positional BWT (PBWT), by Dr. Richard Durbin, the RaPID technology adjusts parameters to optimize detection power and accuracy in the IBD segments. The tool maintains a detection power and accuracy comparable to existing mainstream algorithms and achieves almost linear scaling up to sample size and is orders of magnitude faster than existing mainstream algorithms such as GERM LINE and IBDseq. With the RaPID technology, it is feasible to identify IBDs among hundreds of thousands to millions of individuals, a sample size that will become reality in a few years due to the popularity of genetic ancestry companies.

 

Publications:

Ultra-fast Identity by Descent Detection in Biobank-Scale Cohorts using Positional Burrows-Wheeler Transform

https://www.biorxiv.org/content/early/2017/01/26/103325

 

Inventors:

Degui Zhi

Xiaoming Liu

Shaojie Zhang

Ardalan Naseri

 

Intellectual Property Status:

This technology is available for licensing.

 

 

 

Patent Information:
Category(s):
Software
For Information, Contact:
Hannah Nelson
Senior Technology License Associate
University of Texas Health Science Center At Houston
hannah.m.nelson@uth.tmc.edu
Inventors:
Keywords: