1 - 20
Next
Number of results to display per page
Online 1. Proactive congestion control [2019]
- Jose, Lavanya, author.
- [Stanford, California] : [Stanford University], 2019.
- Description
- Book — 1 online resource.
- Summary
-
Most congestion control algorithms rely on a reactive control system that detects congestion and then marches carefully toward a desired operating point (e.g., by modifying the window size or picking a rate). These algorithms often take hundreds of RTTs to converge—an increasing problem in networks with short-lived flows. Motivated by the need for fast congestion control, this thesis focuses on a different class of congestion control algorithms, called proactive explicit rate-control (PERC) algorithms, which decouple the rate calculation from congestion signals in the network. The switches and NICs exchange control messages to run a distributed algorithm to pick explicit rates for each flow. PERC algorithms proactively schedule flows to be sent at certain explicit rates. They take as input the set of flows and the network link speeds and topology, but not a congestion signal. As a result, they converge faster and their convergence time depends only on fundamental "dependency chains, " essentially couplings between links that carry common flows, that are a property of the traffic matrix and the network topology. We argue that congestion control should converge in a time limited only by fundamental dependency chains. Our main contribution is s-PERC: a new practical distributed proactive scheduling algorithm. It is the first PERC algorithm to provably converge in bounded time without requiring per-flow state or network synchronization. It converges to the exact max-min fair allocation in 6N rounds (where N is the number of links in the longest dependency chain). In simulation and on a P4-programmed FPGA hardware test bed, s-PERC converges an order of magnitude faster than reactive schemes such as TCP, DCTCP, and RCP. Long flows complete in close to the ideal time, while while short-lived flows are prioritized, making it relevant for data centers and wide-area networks (WANs).
- Also online at
-
Online 2. Algorithms and lower bounds for parameter-free online learning [2018]
- Cutkosky, Ashok, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Training a machine learning model today involves minimizing a loss function on datasets that are often gigantic, and so almost all practically relevant training algorithms operate in an online manner by reading in small chunks of the data at a time and making updates to the model on-the-fly. As a result, online learning, a popular way to analyze optimization algorithms operating on datastreams, is at the heart of modern machine learning pipelines. In order to converge to the optimal model as quickly as possible, online learning algorithms all require some user-specified parameters that reflect the shape of the loss or statistics of the input data. Examples of such parameters include the size of the gradients of the losses, the distance from some initial model to the optimal model, and the amount of variance in the data, among others. Since the true values for these parameters are often unknown, the practical implementation of online learning algorithms usually involves simply guessing (called ``tuning''), which is both inefficient and inelegant. This motivates the search for parameter-free algorithms that can adapt to these unknown values. Prior algorithms have achieved adaptivity to many different unknown parameters individually - for example one may adapt to an unknown gradient sizes given a known distance to the optimal model, or adapt to the unknown distance given a known bound on gradient size. However, no algorithm could adapt to both parameters simultaneously. This work introduces new lower bounds, algorithms, and analysis techniques for adapting to many parameters at once. We begin by proving a lower bound showing that adapting to both the size of the gradients and distance to optimal model simultaneously is fundamentally much harder than adapting to either individually, and proceed to develop the first algorithm to meet this lower bound, obtaining optimal adaptivity to both parameters at once. We then expand upon this result to design algorithms that adapt to more unknown parameters, including the variance of the data, different methods for measuring distances, and upper or lower bounds on the second derivative of the loss. We obtain these results by developing new techniques that convert non-parameter-free optimization algorithms into parameter-free algorithms. In addition to providing new and more adaptive algorithms, the relative simplicity of non-parameter-free algorithms allows these techniques to significantly reduce the complexity of many prior analyses.
- Also online at
-
Online 3. Capturing human behavior and language for interactive systems [2018]
- Fast, Ethan, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
From smart homes that prepare coffee when we wake, to phones that know not to interrupt us during important conversations, our collective visions of human-computer interaction (HCI) imagine a future in which computers understand a broad range of human behaviors. Today our systems fall short of these visions, however, because this range of behaviors is too large for designers or programmers to capture manually. In this thesis I will present three systems that mine and operationalize an understanding of human life from large text corpora. The first system, Augur, focuses on what people do in daily life: capturing many thousands of relationships between human activities (e.g., taking a phone call, using a computer, going to a meeting) and the scene context that surrounds them. The second system, Empath, focuses on what people say: capturing hundreds of linguistic signals through a set of pre-generated lexicons, and allowing computational social scientists to create new lexicons on demand. The final system, Codex, explores how similar models can empower an understanding of emergent programming practice across millions of lines of open source code. Between these projects, I will demonstrate how semi-supervised and unsupervised learning can enable many new applications and analyses for interactive systems.
- Also online at
-
Online 4. Combining algorithms and humans for large-scale data integration [2018]
- Verroios, Vasilis, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
The number of enterprises that exploit the data they collect in their decision making processes has been rapidly growing. A fundamental component to achieve this goal, is data integration, the process of combining and unifying data from multiple sources. Through data integration, an enterprise can find the underlying types of entities and entity attributes in the collected data, and map the data to those types and attributes. Enterprises expect constant improvements in the quality of the data integration outcome. At the same time, the volume and diversity of the collected data increases at a fast pace, as enterprises decide to monitor and collect data from more of their operations or other external sources. To satisfy the quality-improvement requirements, human involvement in data integration has been proposed and applied in the last years: the main idea is to use human intelligence in tasks that require the understanding of data semantics or involve data containing images, video, or natural language. In this thesis, we study approaches for different data integration tasks that combine machine algorithms and human intelligence. In particular, we focus on three data integration tasks: entity resolution, top-k item detection, and data aggregation. In entity resolution, for each underlying entity in a set of records (e.g., peoples' face images), the goal is to create one cluster containing only the records that refer to the same entity (e.g., same person). In top-k item detection, the input is a set of records (e.g., restaurant images) and the goal is to find the top-k records based on some criteria (e.g., how nice the restaurant looks in the image). In data aggregation, the goal is to create a summary of some input content (e.g., 5-minute summary of a 2-hour movie or summary of the main points in the reviews for a movie or product). The approaches we develop, provide solutions to a spectrum of challenges that hybrid machine-human, data-integration approaches face. Furthermore, our approaches provide significant performance, accuracy, and monetary-cost gains compared to state-of-the-art alternatives. The gains come from efficiently solving a number of fundamental problems for hybrid data-integration approaches: a) combining human errors with machine algorithm errors, b) selecting the best questions to ask humans, c) selecting between different interfaces for human questions, d) managing the resources available for human questions, e) preprocessing data before human curation is applied, and f) creating a context describing the whole dataset, to assist humans in accurately answering questions.
- Also online at
-
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2018 V | In-library use |
Online 5. Compositional visual intelligence [2018]
- Johnson, Justin, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
The field of computer vision has made enormous progress in the last few years, largely due to convolutional neural networks. Despite success on traditional computer vision tasks, our systems are still a long way from the general visual intelligence of people. An important facet of visual intelligence is composition - understanding of the whole derives from an understanding of the parts. To achieve the goal of compositional visual intelligence, we must explore new computer vision tasks, create new datasets, and develop new models that exploit compositionality. In this dissertation I will discuss my work on three different computer vision tasks involving language, where embracing compositionality helps us build systems with richer visual intelligence. I will first discuss image captioning: traditional systems generate short sentences describing images, but by decomposing images into regions and descriptions into phrases we can that generate two types of richer descriptions: dense captions and paragraphs. Second, I will discuss visual question answering: existing datasets consist primarily of short, simple questions; to study more complex questions requiring com- positional reasoning, we introduce a new benchark dataset where existing methods fall short. We then propose an explicitly compositional model for visual question an- swering that internally converts questions to functional programs, and executes these programs by composing neural modules. Third, I will discuss text-to-image: existing systems can retrieve or generate simple images of a single object conditioned on text descriptions, but struggle with more complex descriptions. By replacing freeform natural language with compositional scene graphs of objects and relationships, we can retrieve and generate complex images containing multiple objects.
- Also online at
-
- Poole, Benjamin, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Neural networks are a core component of both biological and artificial intelligence. Despite advances in training artificial networks to solve complex tasks, they are often brittle in ways that biological networks are not, and our understanding of how they work remains incomplete. In this thesis, I take steps towards applying machine learning to better understand biological neural networks, and leverage ideas from biology to improve artificial neural networks. First, I present a new set of computational tools for processing, analyzing and understanding large-scale datasets in neuroscience. Applying these tools to calcium imaging from direction-selective neurons in the fruit fly visual system, I identify the algorithm the fly uses for computing motion, resolving a nearly 60 year old debate in fly visual neuroscience. In the second part of this thesis, I draw inspiration from neuroscience to advance the capabilities of artificial neural networks. By augmenting neural networks with higher-dimensional synapses, I drastically improve the ability of neural networks to acquire new information without forgetting old information when learning sequences of tasks. I also demonstrate how Poisson-like noise often found in biological systems can be incorporated into artificial neural networks to improve their sparsity, robustness, and ability to generalize. Finally, I demonstrate the reason neural networks are so expressive by analyzing signal propagation in random neural networks using tools from Riemannian geometry and mean field theory. Increasing the depth of even random neural networks exponentially increases their expressivity allowing them to disentangle highly complex manifolds in input space. The results presented in this thesis highlight the increasingly strong relationship between machine learning and neuroscience, both in terms of tools for understanding neural systems as well as inspiration for the capabilities and mechanisms needed for building artificial intelligence.
- Also online at
-
Online 7. Data science for human well-being [2018]
- Althoff, Christopher Tim, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
The popularity of wearable and mobile devices, including smartphones and smartwatches, has generated an explosion of detailed behavioral data. These massive digital traces provide us with an unparalleled opportunity to realize new types of scientific approaches that enable novel insights about our lives, health, and happiness. However, gaining actionable insights from these data requires new computational approaches that turn observational, scientifically "weak" data into strong scientific results and can computationally test domain theories at scale. In this dissertation, we describe novel computational methods that leverage digital activity traces at the scale of billions of actions taken by millions of people. These methods combine insights from data mining, social network analysis, and natural language processing to improve our understanding of physical and mental well-being: (1) We show how massive digital activity traces reveal unknown health inequality around the world, and (2) how personalized predictive models can support targeted interventions to combat this inequality. (3) We demonstrate that modeling the speed of user search engine interactions can improve our understanding of sleep and cognitive performance. (4) Lastly, we describe how natural language processing methods can help improve counseling services for millions of people in crisis.
- Also online at
-
Online 8. Deep 3D representation learning [2018]
- Su, Hao, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Among all digital representations we have for real physical objects, 3D is arguably the most expressive encoding. 3D representations allow storage and manipulation of high-level information (e.g. semantics, affordances, function) as well as low-level features (e.g. appearance, materials) about the object. How much we can understand and transform the 3D world is thus largely determined by the performance of algorithms that analyze and create 3D data. While 3D visual computing has predominantly focused on single 3D models or small model collections, the amount of accessible 3D models has increased by several orders of magnitude during the past few years. This significant growth pushes us to redefine 3D visual computing from the perspective of big 3D data. In the thesis, a series of topics on data-driven 3D visual computing will be discussed. These include: constructing an information-rich large-scale 3D model repository, generating synthetic data for supervising neural networks, and learning end-to-end neural networks for analysis and synthesis of 3D geometries. Under the guiding principle of using large-scale 3D data for representation learning, the efforts described in this thesis have led to top-performing algorithms for pure-3D data processing, as well as 3D-assisted semantic, geometric and physical property inference from 2D images. The thesis will conclude by describing several promising directions for future research.
- Also online at
-
Online 9. Design for collective action [2018]
- Salehi, Niloufar, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
From Twitter hashtags such as #metoo to protests by Amazon Mechanical Turk workers in the public sphere, collectives come together online to make progress on shared issues. Research has long celebrated the web's affordances for galvanizing such coordinated action. By lowering the costs of communication, the web promises to enable distributed collectives to gather and act around shared issues. However, in practice even highly motivated collectives often fail to envision shared outcomes and act on them collectively. This dissertation introduces social computing systems that directly center trust and familiarity to more effectively support collective action. I will demonstrate this concept through three projects. The first, Hive, explores how social systems can help build strong networks by organizing a collective into small teams, then intermixing viewpoints by gradually rotating team membership. I deployed this project with Mozilla to reimagine accessible web browsing with disability advocates online. The second project, Huddler, demonstrates an algorithmic solution to the problem of teams benefiting from highly familiar members but experiencing unpredictable availability. Finally, the third project, Dynamo, shows how structured human labor can help move efforts forward when they stall. I undertook this project in collaboration with worker rights advocates on Amazon Mechanical Turk. Through the body of work in this dissertation I envision alternate roles for social computing ecosystems in society that directly support efforts for social change.
- Also online at
-
Online 10. Design of programmable, energy-efficient reconfigurable accelerators [2018]
- Prabhakar, Raghu, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Current trends in technology scaling, coupled with the increasing compute demands with a limited power budget, has spurred research into specialized accelerator architectures. Field-Programmable Gate Arrays (FPGAs) have gained traction in the past few years as energy- efficient accelerator subtrates as they avoid the energy overheads of instruction-based processor architectures with a statically programmable data path. FPGA architectures sup- port fine-grained reconfigurability at the bit level, which provides flexibility required to implement arbitrary state machines and data paths. However, bit-level reconfigurability in FPGAs creates programming inefficiencies; low-level programming models limit the accessibility of FPGAs to expert hardware designers, and complicates the compiler flow which often takes several hours. Furthermore, bit-level reconfigurability also creates architectural inefficiencies, which increases area and power overheads while reducing compute density. This dissertation addresses both programming and architectural inefficiencies of FPGAs, by describing a compiler flow and a new reconfigurable architecture based on high- level parallel patterns. Parallel patterns are high-level programming abstractions under- lying several domain-specific languages (DSLs) that capture parallelism, memory access information and data locality in applications. To address programming inefficiencies, a new representation based on composable, parameterized hardware templates is proposed that is designed to be targeted from parallel patterns. These templates are designed to capture nested parallelism and locality in applications, and parameterized to expose the application design space to the compiler. A compiler flow is described that performs two key transformations: tiling, and metapipelining, to automatically translate parallel patterns to templates, and hardware. Evaluation of the compiler framework on a Stratix V FPGA shows speedups of up to 39.4× over an optimized baseline. To address architectural inefficiencies, this dissertation proposes a new coarse-grained reconfigurable architecture (CGRA) called Plasticine. Plasticine is built with reconfigurable primitives that natively exploits SIMD, pipelined parallelism, and coarse-grained parallelism at multiple levels. A configurable on-chip memory system with programmable address generation, address interleaving across banks, and buffering enables efficiently exploiting data locality and sustain compute throughput for various access patterns. Pipelined, static interconnects at multiple bus widths allow communication at multiple granularities while minimizing area overhead. Dedicated off-chip address generators and scatter-gather units maximize DRAM bandwidth utilization for dense and sparse accesses. With an area footprint of 113mm2 in a 28-nm process and a 1-GHz clock, Plasticine has a peak floating-point performance of 12.3 single-precision Tflops and a total on-chip memory capacity of 16 MB, consuming a maximum power of 49 W. Plasticine provides an improvement of up to 76.9× in performance-per-watt over a conventional FPGA over a wide range of dense and sparse applications.
- Also online at
-
Online 11. Empirical evaluation of privacy regulation [electronic resource] [2018]
- Mayer, Jonathan Robert.
- 2018.
- Description
- Book — 1 online resource.
- Summary
-
For decades, legal and policy analysis of data privacy issues have tended to be informed by factual assumptions rather than rigorous empirics. The motivation for this dissertation is to recast these assumptions into a set of empirical computer science research questions. Decision makers are, in essence, relying on a set of heuristics to analyze data privacy issues. Are the assumptions behind those heuristics accurate, and what are the consequences of applying those heuristics? The first chapter of this dissertation examines third-party web tracking and advertising, an area that has vexed policymakers in both the United States and Europe. The chapter contributes new methodologies for measuring web tracking, including instrumenting an ordinary web browser and simulating user activity (FourthParty) and collecting tracking-related data directly from ordinary users (BrowserSurvey). The results include new perspectives on the web tracking marketplace, identification of numerous non-cookie tracking technologies, and evidence of inconsistent performance and usability from consumer control mechanisms. The chapter also presents a new design for privacy-preserving third-party advertising (Tracking Not Required) that would provide a specific privacy guarantee, would not require modifying browsers, and would be externally auditable. The second chapter analyzes the distinction between metadata and content, a concept that is foundational to surveillance law and policy in the United States and worldwide. The chapter contributes a new crowdsourcing methodology for studying telephone metadata privacy. It then uses a crowdsourced dataset to demonstrate that telephone metadata is densely interconnected, susceptible to re-identification, and enables highly sensitive inferences. The third chapter assesses data territoriality, another surveillance distinction that is shared by the United States and across the globe. Results from simulated browsing activity indicate that data territoriality is a poor fit for modern web architecture; Americans unknowingly send a large volume of domestic online activity outside the United States, and foreign citizens send a large volume of (from their perspective) domestic online activity into the United States. The conclusion considers the relationship between the computer science community and policymaking about data privacy. It suggests how computer scientists can and must play a greater role in government decision making, to ensure that policy and law reflect the best available privacy science.
- Also online at
-
- Heule, Stefan, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
The ability to automatically reason about programs and extract useful information from them is very important and has received a lot of attention from both the academic community as well as practitioners in industry. Scaling such program analyses to real system is a significant challenge, as real systems tend to be very large, very complex, and often at least part of the system is not available for analysis. A common solution to this problem is to manually write models for the parts of the system that are not analyzable. However, writing these models is both challenging and time consuming. Instead, we propose the use of guided randomized search to find models automatically, and we show how this idea can be applied in three diverse contexts. First, we show how we can use guided randomized search to automatically find models for opaque code, a common problem in program analysis. Opaque code is code that is executable but whose source code is unavailable or difficult to process. We present a technique to first observe the opaque code by collecting partial program traces and then automatically synthesize a model. We demonstrate our method by learning models for a collection of array-manipulating routines. Second, we tackle automatically learning a formal specification for the x86-64 instruction set. Many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-64 instructions. However, formal semantics for the x86-64 ISA are difficult to obtain and often written manually with great effort. Instead, we show how to automatically synthesize formal semantics for 1,795 instruction variants of x86-64. Crucial to our success is a new technique, stratified synthesis, that allows us to scale to longer programs. We evaluate the specification we learned and find that it contains no errors, unlike all manually written specifications we compare against. Third, we consider the problem of program optimization on recent CPU architectures. These modern architectures are incredibly complex and make it difficult to statically determine the performance of a program. Using guided randomized search with a new cost function we are able to outperform the previous state-of-the-art on several metrics, sometimes by a wide margin.
- Also online at
-
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2018 H | In-library use |
Online 13. Human-centric machine learning : enabling machine learning for high-stakes decision-making [2018]
- Lakkaraju, Himabindu, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Domains such as law, healthcare, and public policy often involve highly consequential decisions which are predominantly made by human decision-makers. The growing availability of data pertaining to such decisions offers an unprecedented opportunity to develop machine learning models which can help humans in making better decisions. However, the applicability of machine learning to such scenarios is limited by certain fundamental challenges: a) The data is selectively labeled i.e., we only observe the outcomes of the decisions made by human decision-makers and not the counterfactuals. b) The data is prone to a variety of selection biases and confounding effects. c) The successful adoption of the models that we develop depends on how well decision-makers can understand and trust their functionality, however, most of the existing machine learning models are primarily optimized for predictive accuracy and are not very interpretable. In this dissertation, we develop novel computational frameworks which address the aforementioned challenges, thus, paving the way for large-scale deployment of machine learning models and algorithms to address problems of significant societal impact. We first discuss how to build interpretable predictive models and explanations of complex black box models which can be readily understood and consequently trusted by human decision-makers. We then outline novel evaluation strategies which allow us to reliably compare the quality of human and algorithmic decision-making while accounting for challenges such as selective labels and confounding effects. Lastly, we present approaches which can diagnose and characterize biases (systematic errors) in human decisions and algorithmic predictions.
- Also online at
-
Online 14. Identifying bias in human and machine decisions [2018]
- Corbett-Davies, Samuel James, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Many suspect that decisions in employment, college admissions, lending, and the criminal justice system are systematically biased against certain groups. To date, efforts to provide statistical support for these concerns are often hampered by subtle statistical flaws that are known to affect popular tests for discrimination. The first part of this dissertation introduces these tests and their flaws, before proposing two new statistical tests for discrimination. The threshold test uses a hierarchical Bayesian model to identify the standards that decision-makers apply to different protected groups, while risk-adjusted regression aims to determine whether observed differences in treatment are justified by legitimate policy goals. Decisions aided by machine learning algorithms have also come under scrutiny for bias. As humans use machines to inform more and more consequential decisions, it is becoming crucial to understand how bias might occur in algorithmic systems. The study of identifying bias in algorithmic decisions is young, and many of the initial approaches suffer from the same statistical limitations as tests for bias in human decisions. The second part of this dissertation explores these approaches to algorithmic fairness, demonstrates their deficiencies, and provides a framework for the development of fair algorithmic decisions. Throughout this dissertation we apply our methods to large datasets from the US criminal justice system, studying citizens' interactions with the police and defendants' interactions with the courts. Despite this criminal justice focus, our work is applicable to studying bias in any domain, whether the decisions are made by humans or algorithms.
- Also online at
-
Online 15. Lattice-based non-interactive argument systems [2018]
- Wu, David J., author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Non-interactive argument systems are an important building block in many cryptographic protocols. In this work, we begin by studying non-interactive zero-knowledge (NIZK) arguments for general NP languages. In a NIZK argument system, a prover can convince a verifier that a statement is true without revealing anything more about the statement. Today, NIZK arguments can be instantiated from random oracles, or, in the common reference string (CRS) model, from trapdoor permutations, pairings, or indistinguishability obfuscation. Notably absent from this list are constructions from lattice assumptions, and realizing NIZKs (for general NP languages) from lattices has been a long-standing open problem. In this work, we make progress on this problem by giving the first construction of a multi-theorem NIZK argument from standard lattice assumptions in a relaxed model called the preprocessing model, where we additionally assume the existence of a trusted setup algorithm that generates a proving key (used to construct proofs) and a verification key (used to verify proofs). Moreover, by basing hardness on lattice assumptions, our construction gives the first candidate that plausibly resists quantum attacks. We then turn our attention to constructing succinct non-interactive arguments (SNARGs) for general NP languages. SNARGs enable verifying computations with substantially lower complexity than that required for classic NP verification. Prior to this work, all SNARG constructions relied on random oracles, pairings, or indistinguishability obfuscation. This work gives the first lattice-based SNARG candidates. In fact, we show that one of our new candidates satisfy an appealing property called "quasi-optimality, " which means that the SNARG simultaneously minimizes both the prover complexity and the proof size (up to polylogarithmic factors). This is the first quasi-optimal SNARG from any concrete cryptographic assumption. Again, because of our reliance on lattice-based techniques, all of our new candidates resist quantum attacks (in contrast to existing pairing-based constructions).
- Also online at
-
Online 16. Learning in the rational speech acts model [2018]
- Monroe, William Charles, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
When a person says something that has multiple possible interpretations, which interpretation stands out as the most likely intended meaning often depends on context outside the utterance itself: salient objects in the environment, utterances the speaker could have chosen but didn't, common-sense knowledge, etc. Systematically predicting these contextual effects is a major unsolved problem in computational natural language understanding. A recently-developed framework, known in cognitive science as the rational speech acts (RSA) model, proposes that speaker and listener reason probabilistically about each other's goals and private knowledge to produce interpretations that differ from literal meanings. The framework has shown promising experimental results in predicting a wide variety of previously hard-to-model contextual effects. This dissertation describes a variety of methods combining RSA approaches to context modeling with machine learning methods of language understanding and production. Learning meanings of utterances from examples avoids the need to build an impractically large, brittle lexicon, and having models of both speaker and listener also provides a way to reduce the search space by sampling likely subsets of possible utterances and meanings. Using recently-collected corpora of human utterances in simple language games, I show that a combination of RSA and machine learning yields more human-like models of utterances and interpretations than straightforward machine learning classifiers. Furthermore, the RSA insight relating the listener and speaker roles enables the use of a generation model to improve understanding, as well as suggesting a new way to evaluate natural language generation systems in terms of an understanding task.
- Also online at
-
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2018 M | In-library use |
Online 17. Molecular machine learning with DeepChem [electronic resource] [2018]
- Ramsundar, Bharath.
- 2018.
- Description
- Book — 1 online resource.
- Summary
-
Machine learning has widely been applied to image, video, and speech datasets, but has not yet achieved broad penetration into chemistry, materials science, or other molecular design applications. However, over the last few years, machine learning and deep learning have achieved notable successes in predicting properties of molecular systems. In this thesis, I present a series of deep learning algorithms that demonstrate strong predictive improvements across a wide range of biochemical tasks such as assay activity modeling, toxicity prediction, protein-ligand binding affinity calculation, and chemical retrosynthesis. In addition to these algorithmic improvements, I introduce the comprehensive benchmark suite MoleculeNet for molecular machine learning algorithms (https://moleculenet.ai) and demonstrate how the technology of one-shot learning can be used for drug discovery applications. The work presented in this thesis culminated in my design and construction of DeepChem (https://deepchem.io), an open source package for molecular machine learning, which has achieved broad adoption among biotech startups, pharmaceutical companies, and research groups. DeepChem has attracted a thriving community of open source developers and looks to continue growing and expanding as a vibrant research tool.
- Also online at
-
- Chaganty, Arun Tejasvi, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
In natural language tasks such as knowledge base population, text summarization or open-response question answering, a significant challenge is simply evaluating the performance of automated systems because of the large diversity of possible outputs. Existing fully-automatic methods for evaluating these systems rely on an incomplete set of annotated references which lead to systematic biases against certain system improvements: in other words, genuinely good ideas are systematically discarded simply because of limitations in our evaluation methodology. As a result, human evaluation, which can be prohibitively expensive, has remained the most trusted mode of evaluation for these tasks. In this work, we show how one can decrease the costs of incorporating human feedback through the design of appropriate statistical estimators. First, we consider the "finite incompleteness" setting where the output space is too large to exhaustively annotate, but we may still expect significant overlap between the output of different systems. Naively combining annotations from different systems leads to a representation bias. Here, we show that the cost of obtaining human feedback can be significantly amortized by using a novel importance-reweighted estimator. We apply this estimator to design a new evaluation methodology for knowledge base population and empirically show that the cost of evaluating precision and recall within this framework can be reduced by a factor of 4. Next, we consider the "infinite incompleteness" setting wherein few, if any, systems ever produce identical output. Traditionally, the community has relied on similarity-based automatic metrics such as BLEU or ROUGE to compare the outputs produced by different systems. Unfortunately, these metrics have been shown to correlate poorly with human judgment and thus introduce bias in evaluation. We derive an unbiased estimator that optimally combines these automatic metrics with human feedback. Our theoretical results allow us to characterize potential cost reductions only in terms of the tasks' subjectivity, measured by inter-annotator variance, and the automatic metrics' quality, measured by correlation with human judgments. On two popular natural language generation tasks, question answering and summarization, we empirically show that currently we can achieve at most a 7-13% reduction in cost on two tasks, exposing fundamental limitations in debiasing current automatic metrics. Finally, we turn our attention to incompleteness in the training data, particularly in low-resource settings. Here, our machine learning systems simply have not seen sufficient training data for a particular phenomenon to accurately make predictions for them at test time. To tackle this incarnation of incompleteness, we train a system ``on-the-job'' by requesting for human feedback in real-time while the model is deployed to economically fill in holes in the training data and thus resolve uncertainty in the model. Our key idea here is to cast the problem as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. When tested on three classification tasks---named-entity recognition, sentiment classification, and image classification---we obtain an order of magnitude reduction in cost compared to full human annotation even when starting from zero training examples, while also boosting performance relative to a classical supervised model on the expert-provided labels.
- Also online at
-
Online 19. Neural reading comprehension and beyond [2018]
- Chen, Danqi, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Teaching machines to understand human language documents is one of the most elusive and long-standing challenges in Artificial Intelligence. This thesis tackles the problem of reading comprehension: how to build computer systems to read a passage of text and answer comprehension questions. On the one hand, we think that reading comprehension is an important task for evaluating how well computer systems understand human language. On the other hand, if we can build high-performing reading comprehension systems, they would be a crucial technology for applications such as question answering and dialogue systems. In this thesis, we focus on neural reading comprehension: a class of reading comprehension models built on top of deep neural networks. Compared to traditional sparse, hand-designed feature-based models, these end-to-end neural models have proven to be more effective in learning rich linguistic phenomena and improved performance on all the modern reading comprehension benchmarks by a large margin. This thesis consists of two parts. In the first part, we aim to cover the essence of neural reading comprehension and present our efforts at building effective neural reading comprehension models, and more importantly, understanding what neural reading comprehension models have actually learned, and what depth of language understanding is needed to solve current tasks. We also summarize recent advances and discuss future directions and open questions in this field. In the second part of this thesis, we investigate how we can build practical applications based on the recent success of neural reading comprehension. In particular, we pioneered two new research directions: 1) how we can combine information retrieval techniques with neural reading comprehension to tackle large-scale open-domain question answering; and 2) how we can build conversational question answering systems from current single-turn, span-based reading comprehension models. We implemented these ideas in the DrQA and CoQA projects and we demonstrate the effectiveness of these approaches. We believe that they hold great promise for future language technologies.
- Also online at
-
Online 20. Noising and denoising natural language [2018]
- Xie, Ziang, author.
- [Stanford, California] : [Stanford University], 2018.
- Description
- Book — 1 online resource.
- Summary
-
Written communication is a crucial human activity, and one that has increasingly been altered by technology, including Artificial Intelligence. In this thesis, we describe an application of AI to build computational writing assistants that suggest edits to text, with the aim of correcting errors, improving fluency, and modifying style. Existing AI writing assistants are highly limited in their ability to provide useful suggestions due to the challenge of data sparsity. When state-of-the-art neural sequence transduction models are fed inputs that do not match the training distribution, low-quality and noninterpretable outputs often result. Thus such models must be trained on large parallel corpora—often on the order of millions of sentence pairs—to attain good performance, and even then, they continue to improve with more data. With the end goal of developing more effective AI writing assistants, this thesis addresses the challenge of data sparsity by investigating the effects and applications of noise in the sequence modeling setting. We begin with a theoretical analysis of the effects of sequence-level noise that illustrates the insufficiency of existing approaches for understanding and modeling such noise. With this understanding in place, we develop a method for synthesizing more diverse and realistic noise in natural language, thus remedying the need for parallel data for the task of "denoising" or suggesting edits to writing. To demonstrate the broader applicability of this method, we describe an extension to generate stylistic edits without parallel data between different styles. We close by describing an AI writing assistant that we deployed to validate the methods proposed in this thesis, along with findings to improve such AI assistants in production.
- Also online at
-