Description:
With the availability of genotyping data of very large samples, there is an increasing need for tools that can efficiently identify genetic relationships among all individuals in the sample. One fundamental measure of genetic relationship of a pair of individuals is identity by descent (lBD), chromosomal segments that are shared among two individuals due to common ancestry. However, the efficient identification of lBD segments among a large number of genotyped individuals is a challenging computational problem. Some methods, such as GERMLINE, use fast dictionary lookup of short seed sequence matches to achieve a near-linear time efficiency. However, the number of short seed matches often scales up super-linearly in real population data.
Collaborators at the University of Texas Health Science Center at Houston and the University of Central Florida have developed a novel approach for lBD detection named RaPID. Taking advantage of an efficient population genotype index, Positional BWT (PBWT), by Dr. Richard Durbin, the RaPID technology adjusts parameters to optimize detection power and accuracy in the IBD segments. The tool maintains a detection power and accuracy comparable to existing mainstream algorithms and achieves almost linear scaling up to sample size and is orders of magnitude faster than existing mainstream algorithms such as GERM LINE and IBDseq. With the RaPID technology, it is feasible to identify IBDs among hundreds of thousands to millions of individuals, a sample size that will become reality in a few years due to the popularity of genetic ancestry companies.
Publications:
Ultra-fast Identity by Descent Detection in Biobank-Scale Cohorts using Positional Burrows-Wheeler Transform
https://www.biorxiv.org/content/early/2017/01/26/103325
Inventors:
Degui Zhi
Xiaoming Liu
Shaojie Zhang
Ardalan Naseri
Intellectual Property Status:
This technology is available for licensing.