1 - 7
Number of results to display per page
1 - 7
Number of results to display per page
Online 1. Double coset Markov chains [2022]
- Simper, Mackenzie Alice, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Markov chains and random processes are ubiquitous in statistical and scientific applications. The main contribution of this thesis is to introduce the study of Markov chains on double cosets. Given a group and two sub-groups, the double cosets define equivalence classes of the group. A random walk on the group then induces a random processes on the set of equivalence classes. When is this random process also a Markov chain? Surprisingly, some of the examples capture scientifically (and mathematically) interesting special cases. This thesis develops a general theory for double coset Markov chains. Several cases are investigated in detail. Special attention is given to the example in which parabolic subgroups of the permutation group are indexed by contingency tables. In each of the examples the general theory developed here is applied to understand the eigenvalues, eigenfunctions, and the mixing times of the Markov chains. In the second half of this thesis two more new random processes are studied: A Markov chain on a space of coalescent trees (also related to a double coset space) and an urn model with an infinite color space. The long-term behavior of each is analyzed
- Also online at
-
- Chin, Alex, author.
- [Stanford, California] : [Stanford University], 2019.
- Description
- Book — 1 online resource.
- Summary
-
This thesis presents new methodology for handling interference in randomized experiments. Interference, a phenomenon in which individuals interact with each other, is widely prevalent in the social and natural sciences, and has major implications for how experiments are optimally designed and analyzed. I first provide an introduction to interference, including examples and a relevant brief history of causal inference. Next, I demonstrate how researchers can use Stein's method to establish limiting distributional results for estimators under interference. The modern tools afforded by Stein's method allow one to analyze certain regimes of arbitrarily dense interference, which goes beyond the analysis capabilities of existing tools. In the subsequent chapter, I develop new model-based, adjustment estimators for estimating the global average treatment effect. The adjustment variables can be constructed from functions of the treatment assignment vector, and the researcher can use a collection of any functions correlated with the response, turning the problem of detecting interference into a feature engineering problem. The final chapter proposes new methods for designing and analyzing stochastic seeding strategies, which are an appealing way of leveraging network structure for marketing, public health, and behavioral interventions. New importance sampling estimators adapted to this setting can greatly improve precision over existing approaches. This thesis is interdisciplinary in nature. Stein's method (Chapter 2), regression adjustments (Chapter 3), and importance sampling (Chapter 4) all command spheres of influence in certain sectors of the literature, and are here repurposed in new domains. I hope that my work shows how existing statistical technology can arise in new arenas of application while simultaneously giving rise to new methodological questions and problems, and in this way, I hope my work is useful for both practitioners and methodologists.
- Also online at
-
Online 3. Making causal conclusions from heterogeneous data sources [2020]
- Rosenman, Evan Taylor Ragosa, author.
- [Stanford, California] : [Stanford University], 2020
- Description
- Book — 1 online resource
- Summary
-
The modern proliferation of large observational databases -- in fields such as e-commerce and electronic health -- presents challenges and opportunities for applied researchers. Such data can contain rich information about causal effects of interest, but the effects can only be estimated if we make untestable assumptions and carefully model the assignment mechanism. Experimental data provides a "virtuous" counterpart for the purposes of inferring causal effects, but randomized trials are often limited in size and, consequentially, lack precision. In this thesis, we consider problems of "data fusion, " in which observational and experimental datasets are used together to estimate causal effects. The problem is considered from three angles. First, we develop methods for merging experimental and observational causal effect estimates in the case when all confounding variables are measured in the observational studies. Next, we remove the unconfoundedness assumption, which leads to a new class of estimators based on a shrinkage approach. Finally, we propose a novel solution for designing experiments informed by observational studies, making use of the regret minimization framework. Throughout, we deploy tools from disparate areas of the literature, including Empirical Bayes, decision theory, and convex optimization
- Also online at
-
- Wang, Guanyang, author.
- [Stanford, California] : [Stanford University], 2020
- Description
- Book — 1 online resource
- Summary
-
Markov chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings algorithms, the Gibbs sampler, are ubiquitous in almost every quantitative subject of study such as physics, chemistry, statistics, biology, and computer science. In this thesis, we focus on the following two kinds of sampling problems: 1. How to efficiently sample binary matrices with fixed row and column totals uniformly at random? 2. How to draw samples from 'doubly intractable' distributions? For both of the two problems, we explore the theoretical properties of the existing MCMC algorithms and develop new methods to improve the existing algorithms
- Also online at
-
- Arthur, Joseph G., author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
The comparison of individual genome sequences is a key task for modern studies of population genetics, genotype-phenotype associations, and genome evolution. The problem is difficult in part because commonly-used DNA sequencing hardware produces reads that are orders of magnitude smaller than the size of a single human chromosome. The detection of large genomic mutations known as structural variants (SVs) from these short sequencing reads has emerged has a particularly challenging problem. Numerous methods targeting this problem have been proposed, but it is difficult to assess their performance on real data since the ground truth is typically unknown. Moreover, complex SVs that escape detection by conventional algorithms are known to exist. We propose here a solution to both the complex SV detection problem and the issue of evaluating accuracy on real data.
- Also online at
-
Special Collections
Special Collections | Status |
---|---|
University Archives | Request via Aeon (opens in new tab) |
3781 2018 A | In-library use |
Online 6. Mathematical investigations into fundamental population-genetic statistics and models [2018]
- Arbisser, Ilana Marisa, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
This dissertation explores three projects that take a theoretical approach to studying the underlying models and statistics that are used in population genetics research. Each focuses on a fundamental model or statistic in population genetics, building on the long history of theoretical research in the field while relating the quantity of interest to current research in population genetics. The first two projects relate to the fundamental population genetic model, the coalescent, and the third project explores Wright's FST. The first project consists of modeling a biological process to understand the signature it leaves in data. "A Markov Model of the Coalescent with Recombination and Population Substructure", develops an extension of the coalescent model that incorporates recombination and population substructure. This project serves to offer a framework to simulate random neutral processes that could produce similar genomic signatures to another biological phenomenon, horizontal gene transfer. The second project uses probability theory to study the random genealogies invoked in the coalescent process: coalescent trees. A better understanding of the coalescent model itself can inform the development and use of the statistics derived from the coalescent. "On the Joint Distribution of Height and Length of Trees Under the Coalescent" explores the joint distribution of coalescent tree height, Hn, and length, Ln. Understanding the relationship of height and length is important to the development and analysis of statistics that estimate the length from observed data as a proxy for the height. The third project relates to understanding the properties of a biological statistic, FST. "FST and the Triangle Inequality for Biallelic Markers" explores the use of FST as a measure of distance. FST is not a true distance metric because it does not satisfy the triangle inequality and we show that biallelic FST fails the triangle inequality everywhere. We also show that biallelic FST always fails to be a tree-like distance for distinct allele frequencies. We explore the consequences for analyses that take in a distance matrix, such as spatial analyses, like multidimensional scaling, and tree-building algorithms, like neighbor-joining.
- Also online at
-
Special Collections
Special Collections | Status |
---|---|
University Archives | Request via Aeon (opens in new tab) |
3781 2018 A | In-library use |
Online 7. Not just hopeful monsters : phenotypic and evolutionary properties of very large adaptive mutations [2022]
- Kinsler, Grant Richard, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Understanding which mutations drive adaptation and mapping these mutations to their phenotypic and fitness consequences is a major goal of evolutionary biology. In particular, constructing these genotype-phenotype-fitness maps of adaptive mutations has the promise to answer numerous puz- zles in the field of evolution. One especially tricky puzzle is the existence of examples of very large adaptive mutations. Theoretical work, alongside evidence from quantitative genetics, suggests that organisms are integrated entities and mutations of very large effect should be doomed to be very strongly deleterious. However, there are numerous examples of very large adaptive mutations in nature. In this thesis, I tackle this apparent paradox by leveraging barcode lineage tracking technology to measure the fitness of very large adaptive mutations that arose in an evolution ex- periment conducted with yeast. First, I test the limits of this technology to measure fitness, finding a major source of technical variation that affects many technologies that use sequencing data as a quantitative measure. However, after accounting for this and other technical variation, we find that substantial variation persists between replicate experiments conducted on different days, which represents strong context-dependency in fitness. Second, I leverage this context dependency by conducting fitness measurement experiments in a large collection of subtly varying environments. We then use these data to gain insight into the structure of the genotype-phenotype-fitness maps for these very large adaptive mutations. We find that these mutations are able to exist because they are locally modular - affecting a small number of phenotypes that contribute to fitness in the environment they evolved in. However, these mutations are also globally pleiotropic - affecting many phenotypes that contribute to fitness in other environments. Altogether, this in-depth study of very large adaptive mutations reveals a diversity of phenotypic and evolutionary properties hidden from mere glances at a mutation's fitness, or the gene or pathway the mutation affects
- Also online at
-
Articles+
Journal articles, e-books, & other e-resources
Guides
Course- and topic-based guides to collections, tools, and services.