A discriminative modeling approach for inferring ancestry and correcting phase in large population genetics data sets [electronic resource]
- Brian K. Maples.
- Physical description
- 1 online resource.
All items must be viewed on site
Request items at least 2 days before you visit to allow retrieval from off-site storage. You can request at most 5 items per day.
|3781 2014 M||In-library use|
- Maples, Brian K.
- Bustamante, Carlos, primary advisor.
- Altman, Russ advisor.
- Owen, Art B., advisor.
- Stanford University Program in Biomedical Informatics.
- Local ancestry inference is an important step in both medical genetics studies and demographic studies. This is because many human populations are the result of admixture, or the interbreeding of distinct ancestral populations. The recent drastic increase in sample sizes and marker densities of population genetic data, particularly from whole-genome sequencing, provides an opportunity for computational methods to harness this data to accurately infer fine-scale local ancestry. However, current approaches to inferring local ancestry can only detect continental-level ancestry accurately and are too computationally complex to handle fully sequenced human genomes. Thus there is a need for methods that can utilize massive population genetics data sets to infer fine-scale ancestry in a computationally rapid and robust manner. In this thesis, I describe my contributions toward this goal. First, I describe a method I developed called RFMix, which uses conditional random fields parameterized by random forests to rapidly train on massive data sets, infer fine-scale local ancestry and correct phase. Second, I evaluate RFMix using simulated and real data sets and compare it to other methods. I also apply RFMix to real data sets to infer demographic histories. Finally, I develop a pipeline for generating reference panels from large databases containing mislabeled and unlabeled samples, and apply it to the massive AncestryDNA genetic database to show that using local ancestry inference as an intermediate analysis step gives better global ancestry estimates than traditional direct approaches.
- Publication date
- Submitted to the Program in Biomedical Informatics.
- Thesis (Ph.D.)--Stanford University, 2014.
Browse related items
Start at call number: