1 - 20
Next
Number of results to display per page
- Fan, Lin, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
In Chapters 2-4, we develop approximations for the distribution of regret of multi-armed bandit algorithms. These approximations yield new insights about the exploration-exploitation trade-off in bandit environments. Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that algorithms that are optimal over certain exponential families can achieve expected regret that grows at $\log(T)$ rates with the time horizon $T$, as specified by the Lai-Robbins lower bound. In Chapter 2, we show that when one uses such optimized algorithms, the resulting regret distribution necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for >1$, the $'th moment of the regret distribution grows much faster than poly-$\log(T)$, in particular as a power of $T$. We show that optimized UCB algorithms are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also show a sharp trade-off between the amount of UCB exploration and the tail exponent of the resulting regret distribution. In Chapter 3, we establish strong laws of large numbers and central limit theorems for the regret of two of the most popular bandit algorithms: Thompson sampling and UCB. Here, our characterizations of the regret distribution complement the characterizations of the tail of the regret distribution developed in Chapter 2. The tail characterizations there are associated with atypical bandit behavior on trajectories where the optimal arm mean is under-estimated, leading to mis-identification of the optimal arm and large regret. In contrast, our SLLNs and CLTs here describe the typical behavior and fluctuation of regret on trajectories where the optimal arm mean is properly estimated. We find that Thompson sampling and UCB satisfy the same SLLN and CLT, with the asymptotics of both the SLLN and the (mean) centering sequence in the CLT matching the asymptotics of expected regret. Both the mean and variance in the CLT grow at $\log(T)$ rates with the time horizon $T$. Asymptotically as $T \to \infty$, the variability in the number of plays of each sub-optimal arm depends only on the rewards received for that arm, which indicates that each sub-optimal arm contributes independently to the overall CLT variance. In Chapter 4, we establish diffusion approximations for the behavior of Thompson sampling and related sampling-based algorithms. In the regime where the gaps between arm means scale as
- Also online at
-
Online 2. Data-driven sustainability : advancing electric vehicle adoption and carbon accounting using artificial intelligence and geospatial analytics [2023]
- Oladeji, Olamide, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
This dissertation explores data-driven, computational techniques for sustainability decision-making, particularly focusing on the transportation sector and the carbon accounting landscape. Given the urgency of large-scale decarbonization efforts underscored by the Intergovernmental Panel on Climate Change, this work proposes and evaluates methods to help decision-makers navigate complex interactions between humans, engineering systems, and ecological systems, taking into account factors such as data accessibility and uncertainty. The dissertation is divided into two parts. The first, encompassing chapters two and three, delves into how computational techniques and data can expedite the decarbonization of the transportation sector. The second part, which includes chapters four and five, emphasizes elevating societal understanding and accounting of carbon emissions in the context of sustainable decision-making. Altogether, this dissertation underscores the importance of comprehensive, computationally-enhanced, data-driven, and uncertainty-aware approaches to sustainability decision-making, thus providing practical pathways for effective climate change mitigation.
- Also online at
-
- Gelauff, Lodewijk Leendert, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
Access to digital devices is increasingly prevalent and after the 2020 pandemic, more people than ever before know how to use the basic functionalities of these devices. Governments (more generally, decision makers) have an opportunity to engage in a dialogue with their residents (more generally, stakeholders) that may otherwise not find the time or opportunity to participate. Bringing citizen engagement processes online comes with a responsibility to ensure that they are well understood and that their implementation does not exacerbate inequities in society. It is essential that decision makers have access to tools, methods and manuals to implement these processes in a practical setting. This thesis describes online technologies that have already helped decision makers to engage their stakeholders in participatory budgeting, city budgeting and deliberation. In Participatory Budgeting, decision makers make a portion of their budget available to residents to propose and prioritize projects that they see as the most important for their community. I present data collected from 124 budgeting elections on the PB Stanford online voting platform which allows us to compare voting methods (including both elicitation and aggregation) and ballot designs. We find that while ballot complexity is significantly correlated with the median time spent on the ballot by the voter, there is no such correlation with the abandonment rate. By comparing two ballots from the same voters, we can compare elicitation methods. We confirm that a knapsack ballot can be extracted from a sufficiently large K-ranking elicited ballot (facilitating implementation on a paper ballot) and can compare allocations from different aggregation methods from the same K-ranking elicited ballots. We are also able to establish a better understanding of how the voting method and interface affects the average cost of selected projects. Earlier work observed for a small number of elections that K-approval voting tends to result in higher average cost in aggregate. We confirm that for most ballot pairs in our dataset, the average cost is higher with K-approval voting than with knapsack voting -- but also find examples where knapsack voting results in significantly higher average cost. We measure both the average cost across all projects selected by a voter, and across the most expensive projects selected by a voter -- providing a more nuanced insight in voting behaviors. Organizers often have an express goal to make these democratic engagement processes inclusive and equitable. Online advertising can be used by decision makers to reach out to segments of the population that are otherwise hard to reach. We report findings from field studies in Durham and Greensboro (North Carolina) from the perspective of an advertiser trying to achieve a demographically balanced cohort of respondents. We report that these targeting methods are inaccurate and can result in fewer people (in total and from the targeted group) participating than if a general targeting was used with the same budget. We analyze the data to inform assumptions that a possible targeting algorithm would have to consider, and explore how audiences based on a list of registered voters segmented on known attributes could be used for equitable advertising. When participants are entirely self-selected, or when demographics are not able to distinguish populations of interest, analysis after the fact may be especially important. We report a study with data from a 2020 city budgeting exercise from Austin (Texas) that saw a hundredfold increase in responses after the killing of George Floyd by police officers during the feedback process. We analyze the data from this and a subsequent process, and find that this exogenous shock to the voting process resulted in a change of opinion (rather than a differential turnout). We show how clustering of opinions can be used to offer additional insights into the nature of the shift, and propose a framework to approach for analyzing minority opinions in such processes. Finally, we describe a video conferencing platform that facilitates deliberations without the need for a human moderator, overcoming the need to convene in-person and to train human moderators. The platform has been used in more than 2,000 small group discussions with about 20,000 unique participants as of April 2023, on topics ranging from the Chilean constitution to the Metaverse Terms of Service. We present the design of the platform and evaluate its efficacy. We find by analyzing surveys and metrics from an online exercise with a previous comparable offline exercise that the platform performed on par. We provide preliminary evidence that the platform leads to increased gender participation compared to in-person deliberation, and also performs well on equitable participation across income and education level.
- Also online at
-
Online 4. The development of entrepreneurial skills and motivation : the role of institutions [2023]
- Gope, Khonika, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
Recognizing the pivotal role of entrepreneurship in driving innovation and economic growth, a spectrum of institutions, encompassing government bodies, universities, and business accelerators, has formulated diverse strategies to promote entrepreneurship. These efforts involve various facets, such as formal educational programs, mentoring initiatives, regulatory reforms, and infrastructural support. These institutional endeavors collectively aim to cultivate entrepreneurial skills and motivation among aspiring entrepreneurs. Though extant academic work explores the interaction of institutions and entrepreneurship from diverse angles and provides significant insights, it remains unclear whether and how institutions actually help in developing entrepreneurial skills and motivation. Across three tightly-linked papers, this dissertation addresses this gap. Focusing on accelerators, an emerging institution in the entrepreneurship ecosystem, the first paper explores its impact on an entrepreneur's subsequent professional engagements and long-term impact on entrepreneurial ecosystem. I use a regression discontinuity design on accepted and almost accepted participants from a unique data set of 30,000 applicants from Startup Chile. The second paper explores the impact of bureaucratic institutions on entrepreneurial opportunity recognition in a randomized field experiment in Bangladesh. The third paper explores the strategies and performance of institutional entrepreneurs analyzing more than 8000 entrepreneurs from an Alumni survey dataset. Together, these papers help us address endogeneity issues in prior works and make several contributions to literature on entrepreneurship, and organization theory. It also offers valuable insights for management practitioners and policymakers.
- Also online at
-
Online 5. Direct & indirect effects of extreme weather in a warming world [2023]
- Reed, Brian Andrew, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
Extreme weather seems to be becoming more common and more extreme. A new literature on production networks shows that direct effects from extreme weather spread over supply chains and lead to indirect effects at distant companies. However, existing papers have not driven home their relevance for climate damage estimates. We examine the direct and indirect effects of extreme heat and extreme precipitation on public companies in the U.S. We use 20 years of quarterly financial data and more than 600,000 firm location records to econometrically estimate the direct effect of extreme weather at a company's establishments. We next estimate the indirect effect of extreme weather at a company's suppliers' establishments. We use data from the latest Climate Model Intercomparison Project, CMIP6, to project changes in the incidence of extreme heat and extreme precipitation at company establishments over the next twenty years. We find that the projected indirect effects are roughly half the size of the projected direct effects. We then estimate the direct effects for different subgroups, and we compare the effects of extremes with the effects of warmer or wetter weather than average. We find substantial heterogeneity in how different types of firms are affected by extreme weather. We close by reviewing climate modeling for near-term predictions as the climate changes from its historical norms. We use the climate projections data from our firm analysis to review the accuracy and uncertainty of these models, with the goal of helping provide an operational guide to individuals who will be using these climate data for adaptation and planning
- Also online at
-
Online 6. Interdisciplinary approaches to wildfire risk management [2023]
- Horing, Jill, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
Wildfires pose a significant threat to the state of California. While climate change is contributing to increased fire risk, human activities, including ignitions, alterations to natural landscapes, and development patterns, also play a substantial role in shaping the risk. Moreover, the management actions in response to this risk have societal impacts themselves. This dissertation first frames the challenge of wildfire risk management by considering the tradeoffs between the cost of taking action, the reduction in wildfire damages, and welfare losses. It explores how misaligned incentives and the uneven distribution of costs and benefits lead individual decision-makers to undertake suboptimal mitigation measures. This dissertation then delves deeper into two specific management strategies: the efforts of investor-owned utilities to reduce powerline ignition risk and the actions of individuals in protecting their homes and safety. Using a variety of data collection techniques and methods from economics, risk analysis, and data science, this research provides insights into the costs, drivers, and effectiveness of these strategies. The findings highlight the need for aligning policies and incentives to minimize the impact of both wildfires and mitigation measures and ultimately seek to inform research and decision-making towards effective wildfire management.
- Also online at
-
Online 7. Leveraging game structure in modern optimization [2023]
- Jin, Yujia, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
First-order algorithms have risen in prominence in modern optimization, partly due to their successful application to problems in data science and machine learning, especially in settings with large dimensionality and dataset size. In this thesis, we focus on large-scale optimization problems with game structure, either explicitly or implicitly. By uncovering and leveraging such game structure, we design accelerated or randomized first-order algorithms which obtain nearly-linear or sublinear time for a wide range of fundamental decision making tasks. Towards achieving our results we take a modern twist on two classic optimization techniques, acceleration and randomization, demonstrating their versatile utility in modern algorithm design for game-structured optimization tasks.
- Also online at
-
Online 8. Market design for non-profit applications [2023]
- Allman, Maxwell Harrison Stoller, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
This dissertation is a collection of three essays, each studying a market design setting where the exchange of money is not allowed. There are many real world markets where the exchange of money is considered either legally or ethically unacceptable, such the allocation of school seats, donor organs, or subsidized housing. For these kinds of markets, the market designer must use other strategies to influence the behavior of the participants and affect the outcomes of the market. Each of the three chapters analyzes how the rules of a given kind of market will affect the welfare of the participants along dimensions such as efficiency, fairness and diversity. Chapters one and two study theoretical models, the first is a model of one-sided assignment where a set of items is being allocated to a set of participants, and the second is a model of two-sided matching where doctors and hospitals first interview to learn more about their preferences before being matched. Chapter three describes an applied project where we leveraged the theoretical and empirical literature on school choice to help redesign the student assignment policy in the San Francisco Unified School district (SFUSD)
- Also online at
-
Online 9. Optimal response to epidemics : models to inform policy [2023]
- Rao, Jueli Isabelle, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
Epidemics, such as COVID-19, can have profound and far-reaching impacts on societies and individuals worldwide. Infectious disease outbreaks can result in millions of lives lost and place a tremendous burden on public health resources. Additionally, the opioid epidemic in the U.S. has been further fueled by COVID-19, leading to record drug overdose deaths in 2021. In the face of these epidemics, policy makers face difficult decisions in allocating limited resources to improve population health. To address these challenges, this dissertation focuses on the development of mathematical models to inform critical decisions in public policy, specifically optimizing resource allocation to control epidemics. Chapter 2 centers on the opioid epidemic. We develop a dynamic model to assess the effectiveness of interventions for controlling the US opioid epidemic. We show that reductions in opioid prescriptions are necessary but may lead to a short-term increase in heroin overdose deaths, and thus must be combined with scale up of treatment for addicted individuals -- but that even with immediate policy changes, significant morbidity and mortality will still occur. Our analysis provides critically needed evidence-informed recommendations for reducing opioid-related morbidity and mortality in the U.S. Chapters 3, 4 and 5 focus on developing interpretable models to guide the allocation of limited vaccines to control the spread of an infectious disease. In Chapter 3 we first consider an SIR (susceptible, infected, recovered) model with interacting population groups and a single allocation of vaccine. By approximating the disease dynamics, we derive intuitive analytical conditions characterizing the optimal solution for four different objectives: minimize infections, deaths, life years lost and QALYs lost due to deaths. We extend the work in Chapter 4 to a dynamic setting and develop a method for allocating vaccines over time. In Chapter 5, we further extend the work to an endemic setting, considering vaccine booster doses to take into account waning immunity from vaccination. We show that the approximated optimal solution is an all-or-nothing allocation based on a prioritized list of population groups given by the analytical conditions. Numerical simulations show that the analytical solutions achieve near-optimal results with objective function values significantly better than would be obtained using simple allocation rules such as allocation proportional to population group size. By leveraging interdisciplinary approaches, this dissertation aims to aid in decision making in the areas of opioid abuse, COVID-19, and epidemic control. Importantly, the work provides general theoretical frameworks that can be adapted to other public health challenges for epidemic control.
- Also online at
-
Online 10. Project recon : a computational framework for and analysis of the California parole hearing system [2023]
- Hong, Yun, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
Parole decisions can tip a sentence toward fifteen years or fifty. Despite the great power that parole boards hold, their decision processes are poorly documented and largely hidden from public scrutiny. Parole hearings produce almost no structured data, only an unstructured transcript of hearing dialogue several hundred pages in length. In the following dissertation, we use natural language processing to analyze the transcripts of 35,105 parole hearings held between 2007 and 2019 for candidates serving life sentences in California, totaling approximately five million pages. Through regression analyses of data extracted from the transcripts, after controlling for relevant case factors, we find that several factors outside of the candidate's control explain hearing outcomes. We find that commissioners vary widely in their punitiveness in previously unobserved ways; the assignment to a particular commissioner significantly influences the hearing outcome. Racial disparities limit the quality of legal representation that parole candidates receive as well as their voice in the hearing dialogue, and both significantly predict the parole outcome after again controlling for case factors. Previous analyses of parole systems have been limited by the unavailability of structured data or the task of hand-annotating hearing transcripts. Our results thus provide the most comprehensive picture of a parole system studied to date. While our results carry direct implications for legislative parole reform, our methodology—using machine learning to analyze legal hearings—can be extended to many other procedures in criminal and administrative law with limited structured data
- Also online at
-
Online 11. The yellow brick road to artificial intelligence : an empirical study of developers developing artificial intelligent conversational socialbots [2023]
- Jain, Prachee, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
Artificial Intelligence (AI) technologies are increasingly becoming ubiquitous and invisible - nudging, making recommendations, influencing and making decisions, providing information, forming long-term relationships with users, or merely providing company. Despite this prevalence, few studies have examined how these technologies are designed and developed. This study examines how developers of AI technologies, faced with high levels of ambiguity, approach their work. Development of artificial intelligent (AI) technologies presents a unique phenomenon in which two of the three, input-process-output variables are ambiguous. There is opacity in cause-effect, that is, it is difficult if not impossible to know how inputs to AI technologies are related to outputs, thereby making multiple interpretations of causality plausible. The evaluation of outputs of AI technologies is often based on datasets called ground truth that are inherently ambiguous and dependent upon subjective decisions such as what categories to include in the classification system, and on the interpretation of people assigning these categories to different entities in the dataset. Through this ethnographic study of the development process of conversational AI technologies, I find that developers engage in three ambiguity attitudes - avoiding ambiguity by using manually coded, rule-based response generation techniques to exert control over the output of the technology. They also exhibit ambiguity seeking by employing opaque, deep learning, large language models to auto-generate responses to build resilience in the technology to produce an output in unexpected situations in which the technology would otherwise 'fall off the cliff' or fail. At the same time, developers attempt to resolve ambiguity by engaging in a process of building an empirical understanding from first principles of the phenomenon being automated, by ad hoc experimentation with proxy metrics and intuitions. I call this process 'reverse-building of phenomena.' Developers who embraced ambiguity and built resilient technologies fared better in the competition than those who did not. I contribute to an understanding of how modern-day work is changing for developers with the advent of opaque and ambiguous artificial intelligent technologies
- Also online at
-
Online 12. Beyond the status quo remote work : how workers gain and lose status in their organizations amid shifts to remote work [2022]
- Hinds, Rebecca Anne, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Over the years, there has been a pervasive stigma associated with remote work. Remote workers are typically conferred low status in their organizations and are afforded fewer resources than their "on-site" colleagues. Yet, the COVID-19 pandemic has shaped new conceptions of remote workers and the viability of remote work for the future. As more and more organizations are adopting remote and hybrid work and as remote workers are no longer a minority group in many organizations, remote workers seem to have gained relative status in their organizations. There is a new understanding that remote work is "real" work and workers seem to have more authority than ever before to adopt remote work arrangements. Yet we have minimal understanding of the microdynamics underlying how these shifts related to remote workers' status in organizations are playing out. This dissertation draws on ethnographic methods to examine the status-ridden processes through which workers come to be remote workers and hybrid workers (or fail to become such). It demonstrates how these status dynamics play out through the materiality of technology and through high-status actors' "status contests" and theorizes the less visible ways in which remote workers are gaining and losing status in their organizations. This two-chapter dissertation contributes to research on occupational jurisdiction, the sociology of classification, and remote work, while also offering practical implications aimed at helping organizational leaders make strategic decisions about remote and hybrid work moving forward
- Also online at
-
Online 13. A computational approach to criminal justice reform [2022]
- Chohlas-Wood, Alex, author
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
In recent years, activists and policymakers have become increasingly concerned about the use of data and algorithms in criminal justice settings, fearing that their use will reinforce demographic disparities and perpetuate punitive policies. This dissertation demonstrates how the careful application of such approaches also has the potential to reduce disparities and incarceration in a series of real-world applications. I begin by reviewing a collection of analytic techniques to identify unnecessary and discriminatory police stop practices. Next, I review two new prosecutor-oriented algorithms to aid the decision to charge or dismiss a case after an arrest has occurred. I subsequently review a group of algorithms and analyses intended to reduce the use of incarceration, including risk assessment instruments, pretrial behavioral nudges, and post-prison re-entry programs. I conclude the dissertation by discussing the larger context surrounding these approaches, including some risks and limitations. Overall, my dissertation demonstrates that computational approaches are a valuable tool to advance reform in the criminal justice system
- Also online at
-
Online 14. Cyber risks in networked autonomous systems [2022]
- Goldfrank, Joseph Abraham, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Operation of autonomous unmanned vehicles introduces new risks about which decisionmakers have neither exhaustive statistics nor similar systems from which to derive priors. This model-based risk analysis combines algorithms used for autonomous control, Monte Carlo simulations, and learning parameters from data to improve the risk model's performance. The results inform high-level decisionmakers on when and how best to employ autonomous unmanned vehicles in security and military applications where risk tolerance is higher than for civilian applications, while explicitly maintaining high-risk decisions as the responsibility of human decisionmakers, even when software is used in the process of executing those decisions. This risk analysis has applications in use of unmanned vehicles for localization of radio-frequency threats and maritime tracking of non-cooperative targets using linear array sonar
- Also online at
-
Online 15. Dynamic matching : a queueing perspective [2022]
- Kerimov, Süleyman, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
This dissertation focuses on frictions that arise in various dynamic marketplaces such as kidney exchange, labor markets, and logistics. The central question that we ask is how do heterogeneity, network structure, liquidity, and stochasticity, which cause these frictions, affect our ability to perform simple policies that can achieve efficient outcomes? In Chapters 2 and 3, we analyze dynamic matching markets. When agents arrive to the market over time, an inherent trade-off arises between short- and long-term allocative efficiency. For example, kidney exchange platforms, which arrange exchanges between incompatible patient-donor pairs, can form a match as soon as it becomes feasible, or wait for the market to thicken in order to generate exchanges that yield more life years from transplants. This trade-off raises several questions. How to optimally match agents over time? If the market is cleared periodically, how does the period length affect allocative efficiency at different times? How does stochastic demand impact desirable clearing times? We study these questions from a queueing perspective, and we propose simple batching and greedy policies with a strong performance guarantee: these policies (nearly) maximize the total match value simultaneously at all times. This suggests that the tension between short- and long-term allocative efficiency is essentially moot. In Chapter 4, we analyze scrip systems, where such systems serve an alternative to sustain cooperation, improve efficiency, and mitigate free riding in economies without monetary transfers. Agents request and provide service over time, and scrips are used as artificial currency to pay for service provision. We study the possibility of agents sustaining cooperation when the market is thin, in the sense that only few agents are available to provide the requested service. We analyze the stability of the scrip distribution of agents, assuming that among the available agents, the one with the minimum amount of scrips is selected to provide service. The analysis suggests that even with minimal liquidity in the market, cooperation can be sustained by balancing service provisions among agents. Simulations based on kidney exchange data propose that scrip systems can lead to efficient outcomes in kidney exchange by sustaining cooperation between hospitals
- Also online at
-
Online 16. Dynamic stochastic models for experimentation and matching [2022]
- Wu, Linjia, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
The goal of this thesis is to study key questions arising in matching and experimentation in time-varying stochastic models. In the first part, we study optimal design and statistical inference of switchback experiments. In the second part, we focus on an optimal matching policy of a centralized dynamic matching market
- Also online at
-
Online 17. Efficient universal estimators for symmetric property estimation [2022]
- Shiragur, Kirankumar, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Given i.i.d samples from an unknown distribution, estimating its symmetric properties is a classical problem in information theory, statistics, operations research and computer science. Symmetric properties are those that are invariant to label permutations and include popular functionals such as entropy and support size. Over the past decade, the study of time and sample complexities for estimating properties of distributions has received great attention leading to computationally efficient and sample optimal estimators for various symmetric properties. Most of these estimators were property specific and the design of a single estimator that is sample optimal for any symmetric property remained a central open problem in the area. In a seminal result, Acharya et al. showed that computing an approximate profile maximum likelihood (PML) distribution, a distribution that maximizes the likelihood of the observed multiset of frequencies, allows statistically optimal estimation of various symmetric properties. However, since its introduction by Orlitsky et al. in 2004, efficient computation of approximate PML distributions remained a well-known open problem. In our work, we resolved this question by designing the first efficient algorithm for computing an approximate PML distribution. More broadly our investigation has led to a deeper understanding of various computational and statistical aspects of PML and universal estimators. Additional results include the design of better algorithms for deterministic permanent approximation, new rounding algorithms, faster optimization methods and novel techniques for statistical analysis
- Also online at
-
Online 18. Essays in machine learning in finance [2022]
- Ye, Ye, active 2022 author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
The bond market is one of the largest financial markets, with $52.9 trillion of debt outstanding for the US market as of 2021. The implied interest rate for borrowing at different horizons is the fundamental object for this market. However, a complete set of interest is not observed and must be estimated from the noisy market data. In two papers, we develop machine learning methods to precisely estimate the term structure of interest rates and to understand and manage interest-rate related risks. In the first paper, we introduce a robust, flexible and easy-to-implement method for estimating the yield curve from Treasury securities. This method is non-parametric and optimally learns basis functions in reproducing Hilbert spaces with an economically motivated smoothness reward. We provide a closed-form solution of our machine learning estimator as a simple kernel ridge regression, which is straightforward and fast to implement. We show in an extensive empirical study on U.S. Treasury securities, that our method strongly dominates all parametric and non-parametric benchmarks, which positions our method as the new standard for yield curve estimation. In the second paper, we develop a sparse factor model for bond returns, that unifies non- parametric term structure estimation with cross-sectional factor modeling. Building on the modeling framework of the first paper, we estimate an optimal set of sparse basis functions, which maps into a cross-sectional conditional factor model. Our estimated factors are investable portfolios of traded assets, that replicate the full term structure and are sufficient to hedge against interest rate changes. In an extensive empirical study on U.S. Treasury securities, we show that the term structure of excess returns is well explained by four factors. We introduce a new measure for the time-varying complexity of bond markets based on the exposure to higher-order factors
- Also online at
-
Online 19. Essays on trustworthy data-driven decision making [2022]
- Si, Nian, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Data-driven decision-making systems are deployed ubiquitously in practice, and they have been drastically changing the world and people's daily life. As more and more decisions are made by automatic data-driven systems, it becomes increasingly critical to ensure that such systems are \textit{responsible} and \textit{trustworthy}. In this thesis, I study decision-making problems in realistic contexts and build practical, reliable, and trustworthy methods for their solutions. Specifically, I will discuss the robustness, safety, and fairness issues in such systems. In the first part, we enhance the robustness of decision-making systems via distributionally robust optimization. Statistical errors and distributional shifts are two key factors that downgrade models' performance in deploying environments, even if the models perform well in the training environment. We use distributionally robust optimization (DRO) to design robust algorithms that account for statistical errors and distributional shifts. In Chapter 2, we study distributionally robust policy learning using historical observational data in the presence of distributional shifts. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset. In Chapter 3, we focus on the impact of statistical errors in distributionally robust optimization. We study the asymptotic normality of distributionally robust estimators as well as the properties of an optimal confidence region induced by the Wasserstein distributionally robust optimization formulation. In the second part, we study the A/B tests under a safety budget. Safety is crucial to the deployment of any new features in online platforms, as a minor mistake can deteriorate the whole system. Therefore, A/B tests are the standard practice to ensure the safety of new features before launch. However, A/B tests themselves may still be risky as the new features are exposed to real user traffic. We formulated and studied optimal A/B testing experimental design that minimizes the probability of false selection under pre-specified safety budgets. In our formulation based on ranking and selection, experiments need to stop immediately if the safety budgets are exhausted before the experiment horizon. We apply large deviations theory to characterize optimal A/B testing policies and design associated asymptotically optimal algorithms for A/B testing with safety constraints. In the third part, we study the fairness testing problem. Algorithmic decisions may still possess biases and could be unfair to different genders and races. Testing whether a given machine learning algorithm is fair emerges as a question of first-order importance. In this part, We present a statistical testing framework to detect if a given machine learning classifier fails to satisfy a wide range of group fairness notions. The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are intrinsic to the algorithm or due to the randomness in the data. The statistical challenges, which may arise from multiple impact criteria that define group fairness and are discontinuous on model parameters, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models using optimal transport. This statistic is efficiently computed using linear programming, and its asymptotic distribution is explicitly obtained. The proposed framework can also be used to test composite fairness hypotheses and fairness with multiple sensitive attributes. The optimal transport testing formulation improves interpretability by characterizing the minimal covariate perturbations that eliminate the bias observed in the audit
- Also online at
-
Online 20. Experimental design and decision-making in marketplace platforms [2022]
- Li, Hannah Qiuhan, author.
- [Stanford, California] : [Stanford University], 2022
- Description
- Book — 1 online resource
- Summary
-
Online platforms often rely on experiments to aid decision-making. When considering a new change, they test the intervention on a subset of the users before deciding whether to launch platform-wide. However, in the setting of marketplace platforms, prior work shows that treatment effect estimates can be biased. Users in a market interact with each other, which violates the Stable Unit Treatment Value Assumption (SUTVA), creates biased estimates, and may impact the resulting decisions made from these experiments. We develop models to capture market dynamics and investigate the effect of interference on different designs and estimators. In particular, we are able to highlight and formalize the relationship between the magnitude of the treatment effect bias in commonly run experiments and the level of supply and demand imbalance in the market. Building on these insights, we propose a novel class of experimental designs and estimators using two-sided randomization (TSR), as a method to reduce bias. In addition, we show that the commonly used standard error estimates are also biased in these marketplace settings. We analyze the impact of the statistical biases on the resulting decisions based on the experiment, show that both forms of biases interact to negatively impact decision-making, and propose practical methods to mitigate such biases
- Also online at
-