Discovery and visualization of latent structure with applications to the microbiome [electronic resource]
- Kris Sankaran.
- Physical description
- 1 online resource.
Also available at
At the library
Limited on-site access
Researchers in the Stanford community can request to view these materials in the Special Collections Reading Room. Entry to the Reading Room is by appointment only.
|3781 2018 S||In-library use|
- Human microbiomes -- the collections of bacteria living around and within the human body -- are complex ecological systems, and describing their structure and function in different contexts is important from both basic scientific and medical perspectives. Viewed through a statistical lens, many microbiome analyses framed in terms of discovering and describing latent structure. For example, this structure might reflect sudden environmental shocks that affect certain subsets of species, or may illuminate gradual shifts in community composition. In this thesis, we survey and develop ideas from the data visualization and probabilistic modeling literatures that we have found useful in identifying and characterizing such structure in the microbiome. On the data visualization front, we describe the focus-plus-context and linking principles, and describe new R packages that use these ideas to facilitate visualization of hierarchical collections of time series. These tools streamline the navigation of complex data, guiding researchers towards plausible statistical models. We then turn our attention to modeling, motivated by the fact that microbiome species abundance data often have effectively low-dimensional evolutionary, temporal, and count structure. We then characterize and review methods appropriate for three classes of common microbiome data analysis problems -- dimensionality reduction, multitable integration, and regime detection. For dimensionality reduction, we explore basic probabilistic latent variable models, focusing on mixed-membership and matrix factorization techniques. For multitable integration, we contrast nonparametric ordination, structured regularization, and probabilistic modeling approaches. For regime detection, we compare variants of hidden markov, dynamical systems, and changepoint models, along with baselines that don't take into account time structure. Throughout, we illustrate visualization and modeling techniques using real human gut microbiome data. Code and data for all experiments are available publicly online.
- Publication date
- Submitted to the Department of Statistics.
- Thesis (Ph.D.)--Stanford University, 2018.
Browse related items
Start at call number: