- Collection
- Undergraduate Theses, School of Engineering
In this paper, we develop an algorithm that optimizes logarithmic utility in pairs trading. We assume price processes for two assets, with transaction cost linear with respect to the rate of change in portfolio weights. We then solve the optimization problem via a linear programming approach to approximate dynamic programming. Our simulation results show that when asset price volatility and transaction cost are sufficiently high, our ADP strategy offers significant benefits over the chosen baseline strategy. Our baseline strategy is an optimized version of a pairs trading heuristic studied in the literature.
In this paper, we develop an algorithm that optimizes logarithmic utility in pairs trading. We assume price processes for two assets, with transaction cost linear with respect to the rate of change in portfolio weights. We then solve the optimization problem via a linear programming approach to approximate dynamic programming. Our simulation results show that when asset price volatility and transaction cost are sufficiently high, our ADP strategy offers significant benefits over the chosen baseline strategy. Our baseline strategy is an optimized version of a pairs trading heuristic studied in the literature.
2. Design of large scale nudge engines [electronic resource] [2015] Online
- Book
- 1 online resource.
Congestion is a widespread problem in modern urban transportation networks; hundreds of billions of dollars are lost each year due to wasted time, extra fuel consumption, traffic accidents, etc. A significant fraction of these losses is due to peak-hour congestion on road networks, which occurs when the demand for transport exceeds capacity over a short time period each day. In this thesis, we explore the feasibility of reducing peak-hour road traffic congestion using incentives to shift drivers' commute times. We first discuss a practical implementation of such an incentive program — CAPRI (Congestion And Parking Relief Incentives). This program aimed to reduce peak-hour vehicular traffic into and out of Stanford University. Commuters who sign up for CAPRI earn points for the "good trips" they make, and these points can be redeemed for rewards (both monetary and in-kind). CAPRI also includes the capability to personalize incentives based on users' historical behavior. To complement the implementation, we develop a theoretical model for optimally reducing the cost of peak-hour congestion with targeted incentives. We study the evolution of congestion on a highway under time-varying demands using fluid models of traffic flow. We then examine the effect of shifting users' commute times on congestion. We show that ideas from the theory of optimal transport of measures can be used to develop cost-effective incentive schemes to reduce congestion. Specifically, we show that the "cost of congestion" and the "cost of nudging" are closely related to the Wasserstein distance between measures. We use this relationship to formulate linear programming problems to compute personalized recommendations and incentives to nudge drivers to the off-peak hour. We find that the resultant reduction in the cost of congestion is significant and that it helps to prioritize commuters for nudging based on their origin and destination.
Congestion is a widespread problem in modern urban transportation networks; hundreds of billions of dollars are lost each year due to wasted time, extra fuel consumption, traffic accidents, etc. A significant fraction of these losses is due to peak-hour congestion on road networks, which occurs when the demand for transport exceeds capacity over a short time period each day. In this thesis, we explore the feasibility of reducing peak-hour road traffic congestion using incentives to shift drivers' commute times. We first discuss a practical implementation of such an incentive program — CAPRI (Congestion And Parking Relief Incentives). This program aimed to reduce peak-hour vehicular traffic into and out of Stanford University. Commuters who sign up for CAPRI earn points for the "good trips" they make, and these points can be redeemed for rewards (both monetary and in-kind). CAPRI also includes the capability to personalize incentives based on users' historical behavior. To complement the implementation, we develop a theoretical model for optimally reducing the cost of peak-hour congestion with targeted incentives. We study the evolution of congestion on a highway under time-varying demands using fluid models of traffic flow. We then examine the effect of shifting users' commute times on congestion. We show that ideas from the theory of optimal transport of measures can be used to develop cost-effective incentive schemes to reduce congestion. Specifically, we show that the "cost of congestion" and the "cost of nudging" are closely related to the Wasserstein distance between measures. We use this relationship to formulate linear programming problems to compute personalized recommendations and incentives to nudge drivers to the off-peak hour. We find that the resultant reduction in the cost of congestion is significant and that it helps to prioritize commuters for nudging based on their origin and destination.
3. Efficient learning in sequential optimization [electronic resource] [2015] Online
- Book
- 1 online resource.
We consider a broad class of online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. In these problems, the decision-maker repeatedly chooses among a set of possible actions, observes an outcome, and receives a reward representing the utility derived from this outcome. She is uncertain about the underlying system and is therefore initially unsure of which action is best. However, as outcomes are observed, she is able to learn over time to make increasingly effective decisions. Her objective is to choose actions sequentially so as to maximize the expected cumulative reward. We focus on three algorithmic approaches that accommodate flexible statistical modeling, and are capable of experimenting efficiently in broad classes of problems. The first part of the thesis focuses on a design principle known as optimism in the face of uncertainty, which underlies many of the most effective exploration algorithms. We provide a regret bound for an optimistic algorithm that applies broadly and can be specialized to many specific model classes. Our bound depends on a new notion of dimension that measures the degree of dependence among actions. We compare our notion to the Vapnik-Chervonenkis dimension, and explain why that and other measures of dimension used in the supervised literature do not suffice when it comes to analyzing optimistic algorithms. We then turn our attention to Thompson sampling, an elegant algorithm for learning in online optimization problems with partial feedback. We derive a close theoretical connection between Thompson sampling and optimistic algorithms. Due to the connection we derive, existing analysis available for specific optimistic algorithms immediately translates to expected regret bounds for Thompson sampling. The second part of the thesis pushes beyond the optimistic principle, and offers a fresh, information-theoretic, perspective on the exploration/exploitation tradeoff. We first revisit Thompson sampling from this perspective and provide novel regret bounds that scale with the entropy of the optimal action distribution. Then, we propose a new algorithm--information-directed sampling (IDS)--and study its performance. IDS quantifies the amount learned by selecting an action through an information theoretic measure, and selects actions by optimizing an objective that explicitly balances attaining a high immediate reward and selecting informative actions. We provide a general regret bound for IDS, demonstrate strong performance in simulation, and show through simple analytic examples that it can dramatically outperform Thompson sampling due to the way it quantifies information.
We consider a broad class of online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. In these problems, the decision-maker repeatedly chooses among a set of possible actions, observes an outcome, and receives a reward representing the utility derived from this outcome. She is uncertain about the underlying system and is therefore initially unsure of which action is best. However, as outcomes are observed, she is able to learn over time to make increasingly effective decisions. Her objective is to choose actions sequentially so as to maximize the expected cumulative reward. We focus on three algorithmic approaches that accommodate flexible statistical modeling, and are capable of experimenting efficiently in broad classes of problems. The first part of the thesis focuses on a design principle known as optimism in the face of uncertainty, which underlies many of the most effective exploration algorithms. We provide a regret bound for an optimistic algorithm that applies broadly and can be specialized to many specific model classes. Our bound depends on a new notion of dimension that measures the degree of dependence among actions. We compare our notion to the Vapnik-Chervonenkis dimension, and explain why that and other measures of dimension used in the supervised literature do not suffice when it comes to analyzing optimistic algorithms. We then turn our attention to Thompson sampling, an elegant algorithm for learning in online optimization problems with partial feedback. We derive a close theoretical connection between Thompson sampling and optimistic algorithms. Due to the connection we derive, existing analysis available for specific optimistic algorithms immediately translates to expected regret bounds for Thompson sampling. The second part of the thesis pushes beyond the optimistic principle, and offers a fresh, information-theoretic, perspective on the exploration/exploitation tradeoff. We first revisit Thompson sampling from this perspective and provide novel regret bounds that scale with the entropy of the optimal action distribution. Then, we propose a new algorithm--information-directed sampling (IDS)--and study its performance. IDS quantifies the amount learned by selecting an action through an information theoretic measure, and selects actions by optimizing an objective that explicitly balances attaining a high immediate reward and selecting informative actions. We provide a general regret bound for IDS, demonstrate strong performance in simulation, and show through simple analytic examples that it can dramatically outperform Thompson sampling due to the way it quantifies information.
4. Generalized low rank models [electronic resource] [2015] Online
- Book
- 1 online resource.
Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. This dissertation extends the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, k-means, k-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results.
Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. This dissertation extends the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, k-means, k-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2015 U | In-library use |
5. Medical image similarity perception for content-based image retrieval [electronic resource] [2014] Online
- Book
- 1 online resource.
Content-based image retrieval (CBIR) of perceptually similar images for medical decision support may improve the accuracy and efficiency of radiological diagnosis. A robust reference standard for perceptual similarity in medical images is vital to CBIR applications, but is challenging to create due to the large amount of observer data necessary as well as high inter-observer variability. The aims of this thesis are (1) to develop techniques for creating reference standards based on perceptual similarity that are scalable for large databases, and (2) to model inter-reader variability in observer assessments of image similarity. This thesis first discusses a novel technique for predicting visual similarity for pairs of images in a database from perceptual data obtained by viewing each image individually. In an observer study using 19 CT liver lesions in the portal venous phase, three radiologists provided point-wise ratings for visual attributes of images containing liver lesions displayed individually and also for perceptual similarity in all pair-wise combinations of these images. Using the data generated from viewing images individually, this work develops a scheme to generate ratings of pair-wise similarity using linear fitting, and demonstrates that this is a scalable technique for developing a reference standard. This work also develops a statistical model from these data that may be used to predict inter-reader variability and accuracy of similarity estimates in larger sets of data. Finally, this thesis discusses leveraging compressive sensing techniques, which may be used to complete highly sparse matrices, to impute similarity matrices from a small subset of observer ratings. By being scalable for large databases, this technique may be a reliable and efficient method for creating similarity reference standards. The results of this work thus provide a better understanding of perception of medical image similarity, which may in turn be used to train and validate medical decision support systems.
Content-based image retrieval (CBIR) of perceptually similar images for medical decision support may improve the accuracy and efficiency of radiological diagnosis. A robust reference standard for perceptual similarity in medical images is vital to CBIR applications, but is challenging to create due to the large amount of observer data necessary as well as high inter-observer variability. The aims of this thesis are (1) to develop techniques for creating reference standards based on perceptual similarity that are scalable for large databases, and (2) to model inter-reader variability in observer assessments of image similarity. This thesis first discusses a novel technique for predicting visual similarity for pairs of images in a database from perceptual data obtained by viewing each image individually. In an observer study using 19 CT liver lesions in the portal venous phase, three radiologists provided point-wise ratings for visual attributes of images containing liver lesions displayed individually and also for perceptual similarity in all pair-wise combinations of these images. Using the data generated from viewing images individually, this work develops a scheme to generate ratings of pair-wise similarity using linear fitting, and demonstrates that this is a scalable technique for developing a reference standard. This work also develops a statistical model from these data that may be used to predict inter-reader variability and accuracy of similarity estimates in larger sets of data. Finally, this thesis discusses leveraging compressive sensing techniques, which may be used to complete highly sparse matrices, to impute similarity matrices from a small subset of observer ratings. By being scalable for large databases, this technique may be a reliable and efficient method for creating similarity reference standards. The results of this work thus provide a better understanding of perception of medical image similarity, which may in turn be used to train and validate medical decision support systems.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2014 F | In-library use |
6. Multi-armed bandits with side information [electronic resource] [2014] Online
- Book
- 1 online resource.
Development of personalized strategies has attracted much attention in recent years. Advances in Information Technology has led to the inundation of information, which has provided the impetus for the development of personalized strategies in diverse fields such as medicine, marketing, finance, and education. A methodological framework for the development of these personalized strategies is the theory of multi-armed bandits with side information, also called "contextual" bandits theory, which is an extension of classical (i.e., context-free) multi-armed bandits. This thesis represents an attempt to develop this theory To begin with, we consider the design of clinical trials for developing and testing biomarker-guided personalized therapies. Biomarker-guided personalized therapies offer great promise to improve drug development and patient care, but also pose difficult challenges in designing clinical trials for the development and validation of these therapies. After a review of the existing approaches, we describe new adaptive designs to address these challenges, first for clinical trials in new drug development and then for comparative effectiveness trials involving approved treatments. With the insight from the real-world application, we next consider general multi-armed bandits with side information, and categorizes the approaches to address the multi-armed bandit problems with side information into two groups. One approach uses parametric regression model and the other uses nonparametric regression for the relationship between the reward function of the side information for each arm. We review the existing literature and describe new treatment allocation policies inspired by the design of the biomarker-guided personalized therapy problem. We show that the new treatment allocation policies are applicable to a much more general class of problems than the policies proposed in the literature, and show the improvement of the new policies in simulation studies.
Development of personalized strategies has attracted much attention in recent years. Advances in Information Technology has led to the inundation of information, which has provided the impetus for the development of personalized strategies in diverse fields such as medicine, marketing, finance, and education. A methodological framework for the development of these personalized strategies is the theory of multi-armed bandits with side information, also called "contextual" bandits theory, which is an extension of classical (i.e., context-free) multi-armed bandits. This thesis represents an attempt to develop this theory To begin with, we consider the design of clinical trials for developing and testing biomarker-guided personalized therapies. Biomarker-guided personalized therapies offer great promise to improve drug development and patient care, but also pose difficult challenges in designing clinical trials for the development and validation of these therapies. After a review of the existing approaches, we describe new adaptive designs to address these challenges, first for clinical trials in new drug development and then for comparative effectiveness trials involving approved treatments. With the insight from the real-world application, we next consider general multi-armed bandits with side information, and categorizes the approaches to address the multi-armed bandit problems with side information into two groups. One approach uses parametric regression model and the other uses nonparametric regression for the relationship between the reward function of the side information for each arm. We review the existing literature and describe new treatment allocation policies inspired by the design of the biomarker-guided personalized therapy problem. We show that the new treatment allocation policies are applicable to a much more general class of problems than the policies proposed in the literature, and show the improvement of the new policies in simulation studies.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2014 K | In-library use |
7. Problems, models, and algorithms in data-driven energy demand management [electronic resource] [2014] Online
- Book
- 1 online resource.
A compelling vision for the electricity grid of the 21st century is that of a highly-instrumented system that integrates distributed generation from renewable and conventional sources where superior monitoring allows a targeted, localized, dynamic matching of demand and supply while maintaining a high degree of overall stability. To better monitor demand, utilities have recently deployed massive advanced sensing infrastructure (smart meters) to collect energy consumption data at fine (sub-hourly) time scales from large consumer populations; thus, there is urgent need formalize the new problems and develop the appropriate models, scalable algorithms, and methodologies that can leverage this new information to improve grid operations. The key tension in shaping demand is that while benefits from demand-side management programs are relevant in the aggregate (over many consumers), consumption change happens at the level of the indivdual consumer. As such, incentive schemes (e.g., dynamic pricing) that aim to change certain aspects of the average consumer's consumption may not be optimal for any particular} real consumer. Thus, the perspective this thesis takes is that of data-driven energy program targeting, i.e., using smart meter readings for identifying high-potential types of consumers for certain demand-response and energy-efficiency programs, and designing tailored controls and incentives to improve their usage behavior. This is as much a computational and engineering problem as a management and marketing one. The central contribution of this thesis is on methodology for quantifying uncertainty in individual energy consumption, and relating it to the potential for flexibility for the design and operation of certain demand-side programs. In particular, three algorithmic and modeling contributions are presented that are motivated by the question of comparing and benchmarking the impact and potential of individual consumers to providing flexibility for demand-side management. First, it is noted that individual consumption is empirically observed to be highly volatile; as such no matter how good a predictive model, part of consumption will remain uncertain. Here, this variability is shown to be related to the stress each consumer places on the grid (through their respective cost-of-service); moreover a scalable clustering algorithm is proposed to uncover patterns in variability as encoded in typical distribution functions of consumption. Second, a model of individual consumption is proposed that interprets smart meter readings as the observed outcome of latent, temperature-driven decisions to use either heating, air conditioning, or no HVAC at all; algorithms for learning such response models are introduced that are based on the maximum likelihood estimation framework. The dynamic consumption model is validated experimentally by emphasizing the intended end-use of statistical modeling when comparing with ground-truth data. A third methodological contribution leverages the statistical description of individual consumer response to weather to derive normative, tailored control schedules for thermally-sensitive appliances. These actions are optimal in the sense that they both satisfy individual effort constraints, and contribute to reducing uncertainty in the aggregate over a large population. In addition to the algorithmic and modeling contributions, this thesis presents at great length the application of the methods developed here to realistic situations of segmentation and targeting large populations of consumers for demand-side programs. We illustrate our models and algorithms on a variety of data sets consisting of heterogeneous sources - electricity usage, weather information, consumer attributes - and of various sizes, from a few hundred households in Austin, TX to 120,000 households in Northern California. We validate our dynamic consumption model experimentally, emphasizing the end purpose of decisions made using the outcome of the statistical representation of consumption. Finally, we discuss the two sides of the data coin - increased effectiveness in program management vs potential loss of consumer privacy - in an experimental study in which we argue that certain patterns in consumption as extracted from smart meter data may in some cases aid in predicting relevant consumer attributes (such as the presence of large appliances and lifestyles such as employment or children), but not many others. This, in turn, can enable the the program administrator or marketer to target those consumers whose actual data indicates that they might respond to the program, and may contribute to the debate on what consumers unwillingly reveal about themselves when using energy.
A compelling vision for the electricity grid of the 21st century is that of a highly-instrumented system that integrates distributed generation from renewable and conventional sources where superior monitoring allows a targeted, localized, dynamic matching of demand and supply while maintaining a high degree of overall stability. To better monitor demand, utilities have recently deployed massive advanced sensing infrastructure (smart meters) to collect energy consumption data at fine (sub-hourly) time scales from large consumer populations; thus, there is urgent need formalize the new problems and develop the appropriate models, scalable algorithms, and methodologies that can leverage this new information to improve grid operations. The key tension in shaping demand is that while benefits from demand-side management programs are relevant in the aggregate (over many consumers), consumption change happens at the level of the indivdual consumer. As such, incentive schemes (e.g., dynamic pricing) that aim to change certain aspects of the average consumer's consumption may not be optimal for any particular} real consumer. Thus, the perspective this thesis takes is that of data-driven energy program targeting, i.e., using smart meter readings for identifying high-potential types of consumers for certain demand-response and energy-efficiency programs, and designing tailored controls and incentives to improve their usage behavior. This is as much a computational and engineering problem as a management and marketing one. The central contribution of this thesis is on methodology for quantifying uncertainty in individual energy consumption, and relating it to the potential for flexibility for the design and operation of certain demand-side programs. In particular, three algorithmic and modeling contributions are presented that are motivated by the question of comparing and benchmarking the impact and potential of individual consumers to providing flexibility for demand-side management. First, it is noted that individual consumption is empirically observed to be highly volatile; as such no matter how good a predictive model, part of consumption will remain uncertain. Here, this variability is shown to be related to the stress each consumer places on the grid (through their respective cost-of-service); moreover a scalable clustering algorithm is proposed to uncover patterns in variability as encoded in typical distribution functions of consumption. Second, a model of individual consumption is proposed that interprets smart meter readings as the observed outcome of latent, temperature-driven decisions to use either heating, air conditioning, or no HVAC at all; algorithms for learning such response models are introduced that are based on the maximum likelihood estimation framework. The dynamic consumption model is validated experimentally by emphasizing the intended end-use of statistical modeling when comparing with ground-truth data. A third methodological contribution leverages the statistical description of individual consumer response to weather to derive normative, tailored control schedules for thermally-sensitive appliances. These actions are optimal in the sense that they both satisfy individual effort constraints, and contribute to reducing uncertainty in the aggregate over a large population. In addition to the algorithmic and modeling contributions, this thesis presents at great length the application of the methods developed here to realistic situations of segmentation and targeting large populations of consumers for demand-side programs. We illustrate our models and algorithms on a variety of data sets consisting of heterogeneous sources - electricity usage, weather information, consumer attributes - and of various sizes, from a few hundred households in Austin, TX to 120,000 households in Northern California. We validate our dynamic consumption model experimentally, emphasizing the end purpose of decisions made using the outcome of the statistical representation of consumption. Finally, we discuss the two sides of the data coin - increased effectiveness in program management vs potential loss of consumer privacy - in an experimental study in which we argue that certain patterns in consumption as extracted from smart meter data may in some cases aid in predicting relevant consumer attributes (such as the presence of large appliances and lifestyles such as employment or children), but not many others. This, in turn, can enable the the program administrator or marketer to target those consumers whose actual data indicates that they might respond to the program, and may contribute to the debate on what consumers unwillingly reveal about themselves when using energy.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2014 A | In-library use |
8. Scheduling, revenue sharing, and user behavior for aggregated demand response [electronic resource] [2014] Online
- Book
- 1 online resource.
The modern electric grid can trace its origins back over 100 years. Naturally, many changes and improvements have occurred during this time as new policies and new technologies are incorporated. Although these changes have, in the main, made distributing electrical energy more reliable and economical, the essential role of the grid has not changed significantly - generate sufficient power to match the quantity of power consumed. It has long been acknowledged that the ability for the operator to dynamically adjust both the quantity of power consumed as well as the quantity of power generated would be a major aid in ensuring the electric grid continues to operate economically. Attempting to adjust the consumption of energy in response to challenges faced by the operator is an approach known as demand response and it is the focus of this thesis. Although the concept of demand response is not new, substantial questions remain to be answered. Questions such as how demand response programs could be implemented for residential users, how participants in these programs should be compensated, and how the actions of such residential users can be modeled. This dissertation seeks to answer these questions by establishing and analyzing analytical models. We analyze demand response programs from the point of view of a ``Load Serving Entity, '' or ``aggregator, '' which engages in the energy markets by coordinating and controlling the participants in the program. We describe a novel procedure for direct control of non-preemptive loads by an aggregator. This program is based on a greedy algorithm which schedules the underlying loads in a manner that seeks to reduce the error between the final aggregate load profile and a predefined target profile. Algorithms are presented for two scenarios -- firstly, the case where the loads are known in advance, and secondly where the load demands are stochastic. The aggregator will realize revenue from participating in the energy markets, a portion of which must be returned to the individual users as compensation for being a part of the program. The problem of fairly compensating these users is challenging, and we propose a solution by modeling the scenario in a game theoretic setting and utilizing the concept of the Shapley Value in order to calculate a fair compensation. Finally, we analyze data from an existing demand response program, and model user behavior using a discrete choice modeling framework in order to determine what factors influence the participants response to a program.
The modern electric grid can trace its origins back over 100 years. Naturally, many changes and improvements have occurred during this time as new policies and new technologies are incorporated. Although these changes have, in the main, made distributing electrical energy more reliable and economical, the essential role of the grid has not changed significantly - generate sufficient power to match the quantity of power consumed. It has long been acknowledged that the ability for the operator to dynamically adjust both the quantity of power consumed as well as the quantity of power generated would be a major aid in ensuring the electric grid continues to operate economically. Attempting to adjust the consumption of energy in response to challenges faced by the operator is an approach known as demand response and it is the focus of this thesis. Although the concept of demand response is not new, substantial questions remain to be answered. Questions such as how demand response programs could be implemented for residential users, how participants in these programs should be compensated, and how the actions of such residential users can be modeled. This dissertation seeks to answer these questions by establishing and analyzing analytical models. We analyze demand response programs from the point of view of a ``Load Serving Entity, '' or ``aggregator, '' which engages in the energy markets by coordinating and controlling the participants in the program. We describe a novel procedure for direct control of non-preemptive loads by an aggregator. This program is based on a greedy algorithm which schedules the underlying loads in a manner that seeks to reduce the error between the final aggregate load profile and a predefined target profile. Algorithms are presented for two scenarios -- firstly, the case where the loads are known in advance, and secondly where the load demands are stochastic. The aggregator will realize revenue from participating in the energy markets, a portion of which must be returned to the individual users as compensation for being a part of the program. The problem of fairly compensating these users is challenging, and we propose a solution by modeling the scenario in a game theoretic setting and utilizing the concept of the Shapley Value in order to calculate a fair compensation. Finally, we analyze data from an existing demand response program, and model user behavior using a discrete choice modeling framework in order to determine what factors influence the participants response to a program.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2014 O | In-library use |
9. Heuristics for large scale dynamic programming [electronic resource] [2013] Online
- Book
- 1 online resource.
Dynamic programming algorithms are used to solve problems in a diverse variety of fields, including Decision Analysis, Control Systems, Communication Systems, Dynamic Pricing and Bidding, Networking, Supply Chain Management, and Planning. These applications can be challenging, with exponential complexity, so exact solutions are rarely tractable. Therefore, improvements in the efficiency and quality of approximate dynamic programming algorithms can provide significant benefit. One promising approach is on-line Reinforcement Learning, algorithms which balance exploration and exploitation on Markov decision process applications. In practice, however, these algorithms are usually applied off-line to generate a control policy which is then fixed and used on-line. This work focuses on improving off-line policy generation on both discrete and continuous state-space processes, without the distraction of exploitation. It introduces a variety of modifications to existing algorithms and replacements for those algorithms including backtracking, exact and approximate Kalman filter value modeling, variance feedback, visitation allocation, and particle filter optimization. Backtracking re-orders the update calculations to accelerate the learning process for value iteration. Exact Kalman filter value modeling develops a Bayesian dynamic linear Gaussian representation of value using basis functions. Approximate Kalman filter value modeling is a simplified version that appears to be both higher-performance and more robust than exact Kalman filter value modeling. Variance feedback is an adaptive method for updating a value model that exploits prior knowledge while avoiding overconfidence. Visitation allocation is a heuristic to better manage the exploration process. Particle filter optimization is a stochastic optimization technique that can be applied to many policy search problems, including those that are non-convex. These techniques are demonstrated on a variety of benchmark problems, including both discrete and continuous, discounted and undiscounted, and finite and infinite horizon processes.
Dynamic programming algorithms are used to solve problems in a diverse variety of fields, including Decision Analysis, Control Systems, Communication Systems, Dynamic Pricing and Bidding, Networking, Supply Chain Management, and Planning. These applications can be challenging, with exponential complexity, so exact solutions are rarely tractable. Therefore, improvements in the efficiency and quality of approximate dynamic programming algorithms can provide significant benefit. One promising approach is on-line Reinforcement Learning, algorithms which balance exploration and exploitation on Markov decision process applications. In practice, however, these algorithms are usually applied off-line to generate a control policy which is then fixed and used on-line. This work focuses on improving off-line policy generation on both discrete and continuous state-space processes, without the distraction of exploitation. It introduces a variety of modifications to existing algorithms and replacements for those algorithms including backtracking, exact and approximate Kalman filter value modeling, variance feedback, visitation allocation, and particle filter optimization. Backtracking re-orders the update calculations to accelerate the learning process for value iteration. Exact Kalman filter value modeling develops a Bayesian dynamic linear Gaussian representation of value using basis functions. Approximate Kalman filter value modeling is a simplified version that appears to be both higher-performance and more robust than exact Kalman filter value modeling. Variance feedback is an adaptive method for updating a value model that exploits prior knowledge while avoiding overconfidence. Visitation allocation is a heuristic to better manage the exploration process. Particle filter optimization is a stochastic optimization technique that can be applied to many policy search problems, including those that are non-convex. These techniques are demonstrated on a variety of benchmark problems, including both discrete and continuous, discounted and undiscounted, and finite and infinite horizon processes.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2013 T | In-library use |
10. Modeling and analysis of the role of energy storage for renewable integration [electronic resource] [2013] Online
- Book
- 1 online resource.
The renewable energy generated from wind and solar radiation is increasing rapidly. The power imbalance due to the mismatch between the available renewable generation and load is a significant challenge to the reliable operation of the electric power grid. When the renewable generation is too high, the excess power has to be curtailed. When there is not enough renewable generation, conventional generation from gas turbines can be used to balance the excess load. However, this offsets the environmental benefits of renewable generation, and there is operation cost incurred. Energy storage can be charged by the excess renewable generation and be discharged to meet the excess load and hence reduces the power imbalance. What are the optimal control policies of the electric power system when energy storage is present? How much power imbalance can energy storage reduce? How much cost of integrating renewable generation can energy storage reduce? How much energy storage is needed? This dissertation provides some quantitative answers to these questions by establishing and analyzing simple analytic models. The models are based on the multi-timescale operation of the electric power grid. We define the net load at an operating interval as the difference between the load and the renewable generation. The predictions of the net load are made at multiple timescales, such as, day ahead, hour ahead, and minutes ahead of the operating interval. The power imbalance at each timescale is the difference between the predicted net loads of adjacent timescales. The conventional generation and energy storage charging and discharging operations are scheduled to reduce the power imbalance at each timescale. We first consider the power imbalance problem at each timescale separately and formulate it as an infinite horizon stochastic control problem. The input is the power imbalance process, the controls are the charging and discharging operations of the energy storage, and the output is the residual power imbalance. We show that the greedy control policy of energy storage minimizes the average magnitude of the residual power imbalance. Modeling the power imbalance process as a sequence of independently and identically distributed Laplace random variables at short timescales and as a weakly dependent stationary process at longer timescales, we establish closed form expressions of the minimum cost functions for some cases. We corroborate the analytic results with numerical results using wind power dataset from National Renewable Energy Laboratory (NREL) and Bonneville Power Administration (BPA). To optimize the scheduled generation across multiple timescales, we model the electricity markets as follows. As the operating interval approaches, we have more observations enabling us to predict the net load at the operating interval more accurately. However, the price for scheduling generation for the operating interval at the electricity market increases. We wish to minimize the cost for scheduling generation for the operating interval at multiple electricity markets and the risk from the mismatch between the random net load and the scheduled generation at real time. This risk limiting dispatch problem can be formulated as a finite horizon stochastic control problem, and the optimal policy is a simple threshold policy. We extend the risk limiting dispatch framework to include energy storage. To simplify the analysis, we consider fast-response energy storage and slow energy storage separately. We first consider the risk limiting dispatch problem for a single operating interval in which the fast-response energy storage can be operated at real time during the operating interval to reduce the risk. We show that the optimal policy is still a threshold policy and develop efficient algorithms to compute the thresholds. Next, we consider the risk limiting dispatch problem for multiple operating intervals in which the slow storage can be charged by excess generation at one interval and be discharged to meet the excess load at a later interval. We develop two approximate algorithms and compare them using numerical examples.
The renewable energy generated from wind and solar radiation is increasing rapidly. The power imbalance due to the mismatch between the available renewable generation and load is a significant challenge to the reliable operation of the electric power grid. When the renewable generation is too high, the excess power has to be curtailed. When there is not enough renewable generation, conventional generation from gas turbines can be used to balance the excess load. However, this offsets the environmental benefits of renewable generation, and there is operation cost incurred. Energy storage can be charged by the excess renewable generation and be discharged to meet the excess load and hence reduces the power imbalance. What are the optimal control policies of the electric power system when energy storage is present? How much power imbalance can energy storage reduce? How much cost of integrating renewable generation can energy storage reduce? How much energy storage is needed? This dissertation provides some quantitative answers to these questions by establishing and analyzing simple analytic models. The models are based on the multi-timescale operation of the electric power grid. We define the net load at an operating interval as the difference between the load and the renewable generation. The predictions of the net load are made at multiple timescales, such as, day ahead, hour ahead, and minutes ahead of the operating interval. The power imbalance at each timescale is the difference between the predicted net loads of adjacent timescales. The conventional generation and energy storage charging and discharging operations are scheduled to reduce the power imbalance at each timescale. We first consider the power imbalance problem at each timescale separately and formulate it as an infinite horizon stochastic control problem. The input is the power imbalance process, the controls are the charging and discharging operations of the energy storage, and the output is the residual power imbalance. We show that the greedy control policy of energy storage minimizes the average magnitude of the residual power imbalance. Modeling the power imbalance process as a sequence of independently and identically distributed Laplace random variables at short timescales and as a weakly dependent stationary process at longer timescales, we establish closed form expressions of the minimum cost functions for some cases. We corroborate the analytic results with numerical results using wind power dataset from National Renewable Energy Laboratory (NREL) and Bonneville Power Administration (BPA). To optimize the scheduled generation across multiple timescales, we model the electricity markets as follows. As the operating interval approaches, we have more observations enabling us to predict the net load at the operating interval more accurately. However, the price for scheduling generation for the operating interval at the electricity market increases. We wish to minimize the cost for scheduling generation for the operating interval at multiple electricity markets and the risk from the mismatch between the random net load and the scheduled generation at real time. This risk limiting dispatch problem can be formulated as a finite horizon stochastic control problem, and the optimal policy is a simple threshold policy. We extend the risk limiting dispatch framework to include energy storage. To simplify the analysis, we consider fast-response energy storage and slow energy storage separately. We first consider the risk limiting dispatch problem for a single operating interval in which the fast-response energy storage can be operated at real time during the operating interval to reduce the risk. We show that the optimal policy is still a threshold policy and develop efficient algorithms to compute the thresholds. Next, we consider the risk limiting dispatch problem for multiple operating intervals in which the slow storage can be charged by excess generation at one interval and be discharged to meet the excess load at a later interval. We develop two approximate algorithms and compare them using numerical examples.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2013 S | In-library use |
11. Sufficient statistics for team decision problems [electronic resource] [2013] Online
- Book
- 1 online resource.
Decentralized control problems involve multiple controllers, each having access to different measurements but working together to optimize a common objective. Despite being extremely difficult to solve in general, a common thread behind the more tractable cases is the identification of sufficient statistics for each controller, i.e. reductions of the measurements for each controller that do not sacrifice optimal performance. These sufficient statistics serve to greatly reduce the controller search space, thus making the problem easier to solve. In this dissertation, we develop for the first time a general theory of sufficient statistics for team decision problems, a fundamental type of decentralized control problem. We give rigorous definitions for team decisions and team sufficient statistics, and show how team decisions based only on these sufficient statistics do not affect optimal performance. In a similar spirit to the Kalman filter, we also show how to gracefully update the team sufficient statistics as the state evolves and additional measurements are collected. Finally, we show how to compute team sufficient statistics for partially nested problems, a large class of team decision problems that tend to have easier solutions. These team sufficient statistics have intuitive and compelling interpretations. We also show general conditions when these team sufficient statistics can be updated without the state growing in size. To illustrate the results, we give examples for finite-state systems and systems whose variables are jointly Gaussian.
Decentralized control problems involve multiple controllers, each having access to different measurements but working together to optimize a common objective. Despite being extremely difficult to solve in general, a common thread behind the more tractable cases is the identification of sufficient statistics for each controller, i.e. reductions of the measurements for each controller that do not sacrifice optimal performance. These sufficient statistics serve to greatly reduce the controller search space, thus making the problem easier to solve. In this dissertation, we develop for the first time a general theory of sufficient statistics for team decision problems, a fundamental type of decentralized control problem. We give rigorous definitions for team decisions and team sufficient statistics, and show how team decisions based only on these sufficient statistics do not affect optimal performance. In a similar spirit to the Kalman filter, we also show how to gracefully update the team sufficient statistics as the state evolves and additional measurements are collected. Finally, we show how to compute team sufficient statistics for partially nested problems, a large class of team decision problems that tend to have easier solutions. These team sufficient statistics have intuitive and compelling interpretations. We also show general conditions when these team sufficient statistics can be updated without the state growing in size. To illustrate the results, we give examples for finite-state systems and systems whose variables are jointly Gaussian.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2013 W | In-library use |
12. Three problems in high-dimensional statistical parameter estimation [electronic resource] [2013] Online
- Book
- 1 online resource.
Statistical parameter estimation is the process of estimating parameters of a system from direct or indirect observations. As noted by Bradly Efron in his recent book [29], at its inception parameter estimation was about simple questions using large datasets, e.g., estimating mean and variance of the age of a population using survey data. Later it evolved into problems involving still simple questions but now small amount of data. Examples included hypothesis testing and estimating parameters of a communication channel using short training sequences. This setting resulted in a fascinating literature and promotion of ideas like statistical eciency and power. In recent years, a new class of parameter estimation has been emerging which is best exemplified by the Netflix challenge. These problems are characterized by vast amount of input data but at the same time complex and large set of parameters to be estimated. In the case of the Netflix challenge, the data consist of about 10^8 known movie ratings and the challenge was to estimate about 10^10 unknown ratings. In this work, we consider three problems of this type, namely, learning and controlling high- dimensional dynamical systems, and statistical signal processing for time of flight mass spectrometry. We present both theoretical and experimental results. First, we consider the problem of learning high-dimensional dynamical systems. A continuous time autonomous dynamical system is described by a stochastic differential equation while a discrete time system is described by a state evolution equation. We consider the problem of learning the drift coefficient of a p-dimensional stochastic differential equation from a sample path of length T. We assume that the drift is parametrized by a high-dimensional vector, and study the support recovery problem in the limit in which both p and T tend to infinity. In particular, we prove a general lower iv bound on the sample-complexity T by using a characterization of mutual information as a time integral of conditional variance, due to Kadota, Zakai, and Ziv. For linear stochastic differential equations, the drift coefficient is parametrized by a p-̳>̳ p matrix which describes which degrees of freedom interact under the dynamics. In this case, we analyze an `1-regularized least squares estimator and prove an upper bound on T that nearly matches the lower bound on specific classes of sparse matrices. Linear dynamical systems are important modeling tools for many applications, ranging from robotics and control to biology and chemistry. However, in some applications the ultimate goal is to efficiently control an unknown or poorly known system, i.e., the reinforcement learning of the underlying dynamical system. A linear quadratic dynamical system is describe by a state evolution equation and a cost function. The dynamics is similar to that of an autonomous linear dynamical system with the addition of a control term. The cost at each time is a quadratic function of the state and the control exerted at that step. In the second part of this thesis we study the problem of adaptive control of a high dimensional linear quadratic system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O(\sqrt{T}) was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of O(p\sqrt{T}), apart from logarithmic factors. In particular, our algorithm has an average cost of (1+\epsilon) times the optimum cost after T = polylog(p)O(1/\epsilon^2). This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of e̳ times the optimal cost. We describe the applications of this result in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks. In the third part of this thesis we tend to the problem of statistical signal processing for time of flight mass spectrometry. Time of flight mass spectrometry is a technique for measuring the mass of ionized chemical species. Traditionally, because of the high data rate and the requirement for the online processing of the signal by the instrument, time of fight mass spectrometers employed only rudimentary signal processing. We employ a simple modification to the conventional time of flight mass spectrometry where a variable and (pseudo)-random pulsing rate is used which allows for traces from different pulses to overlap. This modification requires little alteration to the currently employed hardware. However, it requires a reconstruction method to recover the spectrum from highly aliased traces. We propose and demonstrate an efficient algorithm that can process massive time of flight mass spectrometry data using computational resources that can be considered modest with today's standards. We demonstrate how our method can be employed to increase the throughput of an instrument by an order of magnitude. We expect this to extend the applicability of TOFMS to new domains.
Statistical parameter estimation is the process of estimating parameters of a system from direct or indirect observations. As noted by Bradly Efron in his recent book [29], at its inception parameter estimation was about simple questions using large datasets, e.g., estimating mean and variance of the age of a population using survey data. Later it evolved into problems involving still simple questions but now small amount of data. Examples included hypothesis testing and estimating parameters of a communication channel using short training sequences. This setting resulted in a fascinating literature and promotion of ideas like statistical eciency and power. In recent years, a new class of parameter estimation has been emerging which is best exemplified by the Netflix challenge. These problems are characterized by vast amount of input data but at the same time complex and large set of parameters to be estimated. In the case of the Netflix challenge, the data consist of about 10^8 known movie ratings and the challenge was to estimate about 10^10 unknown ratings. In this work, we consider three problems of this type, namely, learning and controlling high- dimensional dynamical systems, and statistical signal processing for time of flight mass spectrometry. We present both theoretical and experimental results. First, we consider the problem of learning high-dimensional dynamical systems. A continuous time autonomous dynamical system is described by a stochastic differential equation while a discrete time system is described by a state evolution equation. We consider the problem of learning the drift coefficient of a p-dimensional stochastic differential equation from a sample path of length T. We assume that the drift is parametrized by a high-dimensional vector, and study the support recovery problem in the limit in which both p and T tend to infinity. In particular, we prove a general lower iv bound on the sample-complexity T by using a characterization of mutual information as a time integral of conditional variance, due to Kadota, Zakai, and Ziv. For linear stochastic differential equations, the drift coefficient is parametrized by a p-̳>̳ p matrix which describes which degrees of freedom interact under the dynamics. In this case, we analyze an `1-regularized least squares estimator and prove an upper bound on T that nearly matches the lower bound on specific classes of sparse matrices. Linear dynamical systems are important modeling tools for many applications, ranging from robotics and control to biology and chemistry. However, in some applications the ultimate goal is to efficiently control an unknown or poorly known system, i.e., the reinforcement learning of the underlying dynamical system. A linear quadratic dynamical system is describe by a state evolution equation and a cost function. The dynamics is similar to that of an autonomous linear dynamical system with the addition of a control term. The cost at each time is a quadratic function of the state and the control exerted at that step. In the second part of this thesis we study the problem of adaptive control of a high dimensional linear quadratic system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of O(\sqrt{T}) was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of O(p\sqrt{T}), apart from logarithmic factors. In particular, our algorithm has an average cost of (1+\epsilon) times the optimum cost after T = polylog(p)O(1/\epsilon^2). This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of e̳ times the optimal cost. We describe the applications of this result in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks. In the third part of this thesis we tend to the problem of statistical signal processing for time of flight mass spectrometry. Time of flight mass spectrometry is a technique for measuring the mass of ionized chemical species. Traditionally, because of the high data rate and the requirement for the online processing of the signal by the instrument, time of fight mass spectrometers employed only rudimentary signal processing. We employ a simple modification to the conventional time of flight mass spectrometry where a variable and (pseudo)-random pulsing rate is used which allows for traces from different pulses to overlap. This modification requires little alteration to the currently employed hardware. However, it requires a reconstruction method to recover the spectrum from highly aliased traces. We propose and demonstrate an efficient algorithm that can process massive time of flight mass spectrometry data using computational resources that can be considered modest with today's standards. We demonstrate how our method can be employed to increase the throughput of an instrument by an order of magnitude. We expect this to extend the applicability of TOFMS to new domains.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2013 I | In-library use |
13. Convex methods for approximate dynamic programming [electronic resource] [2012] Online
- Book
- 1 online resource.
his thesis studies the use of convex optimization in stochastic control problems. We propose methods based on convex optimization for approximate dynamic programming. Dynamic programming is a standard approach to many stochastic control problems, which involves decomposing the problem into a sequence of subproblems to solve for a global minimizer, called the value function. Traditional dynamic programming approaches focus on problems with finite state space and action space, and finding a solution gets prohibitively difficult as the state space or action space grows. With the exception of a few special cases, dynamic programming (DP) is difficult to carry out for general problems. In such cases, approximate dynamic programming (ADP) gives a method for finding a good, if not optimal, policy. In this work, we rely on our ability to (numerically) solve convex optimization problems with great speed and reliability. Using custom generated solvers we can speed up computation by orders of magnitude. In this thesis, we use convex optimization in conjunction with approximate dynamic programming to find the optimal (or suboptimal, but good) policies for an array of stochastic control problems. In the first part of the thesis, we consider applications where we have access to a good controller (which may be very complex): this could be a pilot controlling an aircraft, or a model predictive controller with a long lookahead horizon. In such cases, we can observe the actions chosen in different states, but we do not have access to the underlying objective function used by the controller (such as the pilot). We propose methods that use convex optimization to impute the underlying objective function. This results in a controller that can mimic the behavior of the original controller, often with the complexity reduced greatly. In the second part of the thesis, we develop the mathematical framework and present algorithms for carrying out projected value iteration using quadratic approximate value functions. Although there are no theoretical guarantees, we observe that in practice we achieve very good performance in a reasonable number of steps. We will consider problems in a variety of applications, such as consumer behavior modeling, robotics, finance, input-constrained control, and supply chain management.
his thesis studies the use of convex optimization in stochastic control problems. We propose methods based on convex optimization for approximate dynamic programming. Dynamic programming is a standard approach to many stochastic control problems, which involves decomposing the problem into a sequence of subproblems to solve for a global minimizer, called the value function. Traditional dynamic programming approaches focus on problems with finite state space and action space, and finding a solution gets prohibitively difficult as the state space or action space grows. With the exception of a few special cases, dynamic programming (DP) is difficult to carry out for general problems. In such cases, approximate dynamic programming (ADP) gives a method for finding a good, if not optimal, policy. In this work, we rely on our ability to (numerically) solve convex optimization problems with great speed and reliability. Using custom generated solvers we can speed up computation by orders of magnitude. In this thesis, we use convex optimization in conjunction with approximate dynamic programming to find the optimal (or suboptimal, but good) policies for an array of stochastic control problems. In the first part of the thesis, we consider applications where we have access to a good controller (which may be very complex): this could be a pilot controlling an aircraft, or a model predictive controller with a long lookahead horizon. In such cases, we can observe the actions chosen in different states, but we do not have access to the underlying objective function used by the controller (such as the pilot). We propose methods that use convex optimization to impute the underlying objective function. This results in a controller that can mimic the behavior of the original controller, often with the complexity reduced greatly. In the second part of the thesis, we develop the mathematical framework and present algorithms for carrying out projected value iteration using quadratic approximate value functions. Although there are no theoretical guarantees, we observe that in practice we achieve very good performance in a reasonable number of steps. We will consider problems in a variety of applications, such as consumer behavior modeling, robotics, finance, input-constrained control, and supply chain management.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2012 K | In-library use |
14. Design and implementation of stochastic control policies via convex optimization [electronic resource] [2012] Online
- Book
- 1 online resource.
In this dissertation we consider the design and implementation of control policies for stochastic control problems with arbitrary dynamics, objective, and constraints. In some very special cases, these problems can be solved analytically. For instance, when the dynamics are linear, and the objective is quadratic, the optimal control policy is linear state feedback. Another simple case where the optimal policy can be computed exactly is when the state and action spaces are finite, in which case methods such as value iteration or policy iteration can be used. When the state and action spaces are infinite, but low dimensional, the optimal control problem can be solved by gridding or other discretization methods. In general however, the optimal control policy cannot be tractably computed. In such situations, there are many methods for finding suboptimal controllers that hopefully achieve a small objective value. One particular method we will discuss in detail is approximate dynamic programming (ADP), which relies on an expression for the optimal policy in terms of the value function of the stochastic control problem. In ADP we use the same expression as the optimal policy, but replace the true value function with an approximation. Another widely-used suboptimal policy is receding horizon control (RHC), also known as model predictive control (MPC). In MPC, we solve an optimization problem at each time step to determine a plan of action over a fixed time horizon, and then apply the first input from the plan. At the next time step we repeat the planning process, solving a new optimization problem, with the time horizon shifted one step forward. In the design of policies such as these, we must choose parameters, such as approximate value functions, terminal costs, time horizon, to achieve good performance. Ideally, we would like to be able to compare the performance of a suboptimal controller with the optimal performance, which we cannot compute. In this dissertation, we describe a general method for obtaining a function that lower bounds the value function of a stochastic control problem. Our method yields a numerical lower bound on the optimal objective value, as well as a value function underestimator that can be used as a parameter for ADP or MPC. We can then compare our bound to the performance achieved by our policies. If the gap between the two is small, we can conclude that the policies are nearly optimal, and the bound is nearly tight. Thus, our method simultaneously yields suboptimal policy designs, as well as a way to certify their performance. Our underestimator/bound is non-generic, in the sense that it does not simply depend on problem dimensions and some basic assumptions about the problem data. Instead, they are computed (numerically) for each specific problem instance. We will see that for many problem families, our method is based on solving a convex optimization problem, thus avoiding the `curses of dimensionality' usually associated with dynamic programming. One drawback of the suboptimal policies we design is that an optimization problem must be solved at each time step to determine the input to apply to the system. Using conventional optimization solvers this can take seconds, if not minutes. Thus, applications of these policies have been traditionally limited to systems with relatively slow dynamics, with sampling times measured in seconds, minutes, or hours. In the second part of this dissertation, we outline a collection of optimization methods that exploit the particular structure of the control problem. Our custom methods are up to around 1000 times faster compared with generic optimization packages such as SDPT3 or SeDuMi. These advances, combined with ever-increasing computing power, extends the application of optimization based policies to a wide range of applications, including those with millisecond or microsecond sampling periods.
In this dissertation we consider the design and implementation of control policies for stochastic control problems with arbitrary dynamics, objective, and constraints. In some very special cases, these problems can be solved analytically. For instance, when the dynamics are linear, and the objective is quadratic, the optimal control policy is linear state feedback. Another simple case where the optimal policy can be computed exactly is when the state and action spaces are finite, in which case methods such as value iteration or policy iteration can be used. When the state and action spaces are infinite, but low dimensional, the optimal control problem can be solved by gridding or other discretization methods. In general however, the optimal control policy cannot be tractably computed. In such situations, there are many methods for finding suboptimal controllers that hopefully achieve a small objective value. One particular method we will discuss in detail is approximate dynamic programming (ADP), which relies on an expression for the optimal policy in terms of the value function of the stochastic control problem. In ADP we use the same expression as the optimal policy, but replace the true value function with an approximation. Another widely-used suboptimal policy is receding horizon control (RHC), also known as model predictive control (MPC). In MPC, we solve an optimization problem at each time step to determine a plan of action over a fixed time horizon, and then apply the first input from the plan. At the next time step we repeat the planning process, solving a new optimization problem, with the time horizon shifted one step forward. In the design of policies such as these, we must choose parameters, such as approximate value functions, terminal costs, time horizon, to achieve good performance. Ideally, we would like to be able to compare the performance of a suboptimal controller with the optimal performance, which we cannot compute. In this dissertation, we describe a general method for obtaining a function that lower bounds the value function of a stochastic control problem. Our method yields a numerical lower bound on the optimal objective value, as well as a value function underestimator that can be used as a parameter for ADP or MPC. We can then compare our bound to the performance achieved by our policies. If the gap between the two is small, we can conclude that the policies are nearly optimal, and the bound is nearly tight. Thus, our method simultaneously yields suboptimal policy designs, as well as a way to certify their performance. Our underestimator/bound is non-generic, in the sense that it does not simply depend on problem dimensions and some basic assumptions about the problem data. Instead, they are computed (numerically) for each specific problem instance. We will see that for many problem families, our method is based on solving a convex optimization problem, thus avoiding the `curses of dimensionality' usually associated with dynamic programming. One drawback of the suboptimal policies we design is that an optimization problem must be solved at each time step to determine the input to apply to the system. Using conventional optimization solvers this can take seconds, if not minutes. Thus, applications of these policies have been traditionally limited to systems with relatively slow dynamics, with sampling times measured in seconds, minutes, or hours. In the second part of this dissertation, we outline a collection of optimization methods that exploit the particular structure of the control problem. Our custom methods are up to around 1000 times faster compared with generic optimization packages such as SDPT3 or SeDuMi. These advances, combined with ever-increasing computing power, extends the application of optimization based policies to a wide range of applications, including those with millisecond or microsecond sampling periods.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2012 W | In-library use |
15. Directed learning [electronic resource] [2012] Online
- Book
- 1 online resource.
In machine learning, it is common to treat estimation of model parameters separately from subsequent use of the model to guide decisions. In particular, the learning process typically aims to maximize ``goodness of fit'' without consideration of decision objectives. In this dissertation, we propose a new approach -- directed learning -- which factors decision objectives into the model fitting procedure in order to improve decision quality. We develop and analyze directed learning algorithms for three classes of problems. In the first case, we consider a problem where linear regression analysis is used to guide decision making. We propose directed regression, an efficient algorithm that takes into account the decision objective when computing regression coefficients. We demonstrate through a computational study that directed regression can generate significant performance gains, and establish a theoretical result that motivates it. This setting is then extended to a multi-stage decision problem as our second case, and we show that a variation of directed regression, directed time-series regression, improves performance in this context as well. Lastly, we consider a problem that involves estimating a covariance matrix and making a decision based on that estimate. Such problems arise in portfolio management among other areas, and a common approach is to employ principal component analysis (PCA) to estimate a parsimonious factor model. We propose directed PCA, an efficient algorithm that accounts for the decision objective in the selection of components, and demonstrate through experiments that it leads to significant improvement. We also establish through a theoretical result that the possible degree of improvement can be unbounded.
In machine learning, it is common to treat estimation of model parameters separately from subsequent use of the model to guide decisions. In particular, the learning process typically aims to maximize ``goodness of fit'' without consideration of decision objectives. In this dissertation, we propose a new approach -- directed learning -- which factors decision objectives into the model fitting procedure in order to improve decision quality. We develop and analyze directed learning algorithms for three classes of problems. In the first case, we consider a problem where linear regression analysis is used to guide decision making. We propose directed regression, an efficient algorithm that takes into account the decision objective when computing regression coefficients. We demonstrate through a computational study that directed regression can generate significant performance gains, and establish a theoretical result that motivates it. This setting is then extended to a multi-stage decision problem as our second case, and we show that a variation of directed regression, directed time-series regression, improves performance in this context as well. Lastly, we consider a problem that involves estimating a covariance matrix and making a decision based on that estimate. Such problems arise in portfolio management among other areas, and a common approach is to employ principal component analysis (PCA) to estimate a parsimonious factor model. We propose directed PCA, an efficient algorithm that accounts for the decision objective in the selection of components, and demonstrate through experiments that it leads to significant improvement. We also establish through a theoretical result that the possible degree of improvement can be unbounded.
16. Efficient algorithms for collaborative filtering [electronic resource] [2012] Online
- Book
- 1 online resource.
Collaborative filtering is a novel statistical technique to obtain useful information or to make predictions based on data from multiple agents. A large number of such datasets are naturally represented in matrix form. Typically, there exists a matrix M from which we know a (typically sparse) subset of entries M_ij for (i, j) in some set E. The problem then is to predict/approximate the unseen entries. This framework of matrix completion is extremely general and applications include personalized recommendation systems, sensor positioning, link prediction and so on. Low rank models have traditionally been used to learn useful information from such datasets. Low-dimensional representations simplify the description of the dataset and often yield predictive powers. As an added benefit, it is easier to store and retrieve low dimensional representations. Finally, many computationally intensive operations such as matrix multiplication and inversion are simplified with low dimensional representations. Singular Value Decomposition (SVD) has traditionally been used to find the lowdimensional representation of a fully revealed matrix. There are numerous algorithms for computing the SVD of a matrix including several parallel implementations and implementations for sparse matrices. However, when the matrix is only partially observed, we show that SVD techniques are sub-optimal. In this work, we will develop algorithms to learn a low rank model from a partially revealed matrix. These algorithms are computationally efficient and highly parallelizable. We will show that the proposed algorithms achieve a performance close to the fundamental limit in a number of scenarios. Finally, the algorithms achieve significantly better performance than the state-of-the-art algorithms on many real collaborative filtering datasets.
Collaborative filtering is a novel statistical technique to obtain useful information or to make predictions based on data from multiple agents. A large number of such datasets are naturally represented in matrix form. Typically, there exists a matrix M from which we know a (typically sparse) subset of entries M_ij for (i, j) in some set E. The problem then is to predict/approximate the unseen entries. This framework of matrix completion is extremely general and applications include personalized recommendation systems, sensor positioning, link prediction and so on. Low rank models have traditionally been used to learn useful information from such datasets. Low-dimensional representations simplify the description of the dataset and often yield predictive powers. As an added benefit, it is easier to store and retrieve low dimensional representations. Finally, many computationally intensive operations such as matrix multiplication and inversion are simplified with low dimensional representations. Singular Value Decomposition (SVD) has traditionally been used to find the lowdimensional representation of a fully revealed matrix. There are numerous algorithms for computing the SVD of a matrix including several parallel implementations and implementations for sparse matrices. However, when the matrix is only partially observed, we show that SVD techniques are sub-optimal. In this work, we will develop algorithms to learn a low rank model from a partially revealed matrix. These algorithms are computationally efficient and highly parallelizable. We will show that the proposed algorithms achieve a performance close to the fundamental limit in a number of scenarios. Finally, the algorithms achieve significantly better performance than the state-of-the-art algorithms on many real collaborative filtering datasets.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2012 K | In-library use |
17. Incentive mechanisms for societal networks [electronic resource] [2012] Online
- Book
- 1 online resource.
A "societal network" is an interconnected structure of resources and consumers that drives a societal process. Examples include roads, electricity grids, health-care systems, and waste management networks. Societal networks are plagued by two pressing problems: (i) the demand for resources exceeds their supply, especially at "peak'' hours, and (ii) inefficiencies due to wastage are highly permeable, i.e., while the loss to an individual is very small, the overall loss to the societal network as a whole can be substantial. This suggests that small changes in the behavior of each user can lead to significant gains in the efficiency of societal networks. More importantly, such a change in each user is necessary to combat systemic problems such as congestion. There is a strong urgency to the problems in societal networks -- the scale of the problems has reached epic proportions. Fortunately, help is at hand. Over the past few decades, there have been great technological inventions such as the creation of the global Internet, the Web, cellular telephony and cloud computing. Now is the right time to address the problems in societal networks making use of the ubiquitous presence of smart sensing technology. This thesis presents a first attempt at that -- making use of a variety of sensing technology to monitor user behavior, informing the users about their actions and incentivizing them to adopt a desirable behavior. This thesis develops a general approach to affect change in the behavior of the users of societal networks. At the first level, it is based on fine-grained sensing of user behavior and communicating this information to a platform, and using incentives of monetary, social and other kinds to change user behavior. At the second level, the approach is concerned with the design of low-cost and accurate sensors, and backend algorithms to inform, engage and incentivize users to affect the desired behavior change. We describe some pilot projects which we have conducted to test and verify the effectiveness of this approach. The first project, INSTANT (Infosys-Stanford Traffic project), was conducted over 6 months in Bangalore, India, to incentivize roughly 14,000 commuters of Infosys Technologies to shift their commute to off-peak hours. The second project is a recycling experiment conducted over one week at Stanford to incentivize the members of the Stanford community to recycle. Third, we outline our contributions to Steptacular, an employee wellness program run at Accenture, USA. Finally, we describe CAPRI (Congestion and Parking Relief Incentives), a research project currently underway at Stanford University to incentivize off-peak travel and "off-center" parking.
A "societal network" is an interconnected structure of resources and consumers that drives a societal process. Examples include roads, electricity grids, health-care systems, and waste management networks. Societal networks are plagued by two pressing problems: (i) the demand for resources exceeds their supply, especially at "peak'' hours, and (ii) inefficiencies due to wastage are highly permeable, i.e., while the loss to an individual is very small, the overall loss to the societal network as a whole can be substantial. This suggests that small changes in the behavior of each user can lead to significant gains in the efficiency of societal networks. More importantly, such a change in each user is necessary to combat systemic problems such as congestion. There is a strong urgency to the problems in societal networks -- the scale of the problems has reached epic proportions. Fortunately, help is at hand. Over the past few decades, there have been great technological inventions such as the creation of the global Internet, the Web, cellular telephony and cloud computing. Now is the right time to address the problems in societal networks making use of the ubiquitous presence of smart sensing technology. This thesis presents a first attempt at that -- making use of a variety of sensing technology to monitor user behavior, informing the users about their actions and incentivizing them to adopt a desirable behavior. This thesis develops a general approach to affect change in the behavior of the users of societal networks. At the first level, it is based on fine-grained sensing of user behavior and communicating this information to a platform, and using incentives of monetary, social and other kinds to change user behavior. At the second level, the approach is concerned with the design of low-cost and accurate sensors, and backend algorithms to inform, engage and incentivize users to affect the desired behavior change. We describe some pilot projects which we have conducted to test and verify the effectiveness of this approach. The first project, INSTANT (Infosys-Stanford Traffic project), was conducted over 6 months in Bangalore, India, to incentivize roughly 14,000 commuters of Infosys Technologies to shift their commute to off-peak hours. The second project is a recycling experiment conducted over one week at Stanford to incentivize the members of the Stanford community to recycle. Third, we outline our contributions to Steptacular, an employee wellness program run at Accenture, USA. Finally, we describe CAPRI (Congestion and Parking Relief Incentives), a research project currently underway at Stanford University to incentivize off-peak travel and "off-center" parking.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2012 M | In-library use |
18. Strategic and adaptive execution [electronic resource] [2012] Online
- Book
- 1 online resource.
First, we consider a trader who aims to liquidate a large position in the presence of an arbitrageur who hopes to profit from the trader's activity. The arbitrageur is uncertain about the trader's position and learns from observed price fluctuations. This is a dynamic game with asymmetric information. We present an algorithm for computing perfect Bayesian equilibrium behavior and conduct numerical experiments. Our results demonstrate that the trader's strategy differs significantly from one that would be optimal in the absence of the arbitrageur. In particular, the trader must balance the conflicting desires of minimizing price impact and minimizing information that is signaled through trading. Accounting for information signaling and the presence of strategic adversaries can greatly reduce execution costs. Second, we consider a model in which a trader seeks to maximize expected risk-adjusted profit while trading a single security. In our second model, there is no arbitrageur, and each price change is a linear combination of observed factors, impact resulting from the trader's current and prior activity, and unpredictable random effects. The trader must learn coefficients of a price impact model while trading. We propose a new method for simultaneous execution and learning -- the confidence-triggered regularized adaptive certainty equivalent (CTRACE) policy -- and establish a polylogarithmic finite-time expected regret bound. This bound implies that CTRACE is efficient in the sense that the ([epsilon], [delta])-convergence time is bounded by a polynomial function of 1/[epsilon] and log(1/[delta]) with high probability. In addition, we demonstrate via Monte Carlo simulation that CTRACE outperforms the certainty equivalent policy and a recently proposed reinforcement learning algorithm that is designed to explore efficiently in linear-quadratic control problems.
First, we consider a trader who aims to liquidate a large position in the presence of an arbitrageur who hopes to profit from the trader's activity. The arbitrageur is uncertain about the trader's position and learns from observed price fluctuations. This is a dynamic game with asymmetric information. We present an algorithm for computing perfect Bayesian equilibrium behavior and conduct numerical experiments. Our results demonstrate that the trader's strategy differs significantly from one that would be optimal in the absence of the arbitrageur. In particular, the trader must balance the conflicting desires of minimizing price impact and minimizing information that is signaled through trading. Accounting for information signaling and the presence of strategic adversaries can greatly reduce execution costs. Second, we consider a model in which a trader seeks to maximize expected risk-adjusted profit while trading a single security. In our second model, there is no arbitrageur, and each price change is a linear combination of observed factors, impact resulting from the trader's current and prior activity, and unpredictable random effects. The trader must learn coefficients of a price impact model while trading. We propose a new method for simultaneous execution and learning -- the confidence-triggered regularized adaptive certainty equivalent (CTRACE) policy -- and establish a polylogarithmic finite-time expected regret bound. This bound implies that CTRACE is efficient in the sense that the ([epsilon], [delta])-convergence time is bounded by a polynomial function of 1/[epsilon] and log(1/[delta]) with high probability. In addition, we demonstrate via Monte Carlo simulation that CTRACE outperforms the certainty equivalent policy and a recently proposed reinforcement learning algorithm that is designed to explore efficiently in linear-quadratic control problems.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2012 P | In-library use |
19. Intermediated blind portfolio auctions [electronic resource] [2011] Online
- Book
- 1 online resource.
As much as 12% of the daily volume on the New York Stock Exchange, and similar volumes on other major world exchanges, involves the sale of portfolios by institutional investors to brokers through blind portfolio auctions. Such transactions typically take the form of a first-price sealed-bid auction in which the seller engages a few potential brokers and provides limited information about the portfolio being sold. Uncertainty about the portfolio contents reduces bids, effectively increasing the transaction cost paid by the seller. We consider the use of a trusted intermediary or equivalent cryptographic protocol to reduce transaction costs. In particular, we propose a mechanism through which each party provides relevant private information to an intermediary who ultimately reveals only the portfolio contents and price paid, and only to the seller and winning broker. Through analysis of a game-theoretic model, we demonstrate substantial potential benefits to sellers. For example, under reasonable assumptions a seller can reduce expected transaction costs by as much as 10% under a broker valuation model that induces efficient allocations at equilibrium, and as much as 33% under one that does not. We also consider the effects of intermediation on broker payoffs and social welfare as well as its performance in the context of the second-price auction. In addition, we consider the two-stage game in which sellers have the choice of selecting intermediation or not and show that under certain reasonable modeling assumptions that all sellers will select intermediation at equilibrium.
As much as 12% of the daily volume on the New York Stock Exchange, and similar volumes on other major world exchanges, involves the sale of portfolios by institutional investors to brokers through blind portfolio auctions. Such transactions typically take the form of a first-price sealed-bid auction in which the seller engages a few potential brokers and provides limited information about the portfolio being sold. Uncertainty about the portfolio contents reduces bids, effectively increasing the transaction cost paid by the seller. We consider the use of a trusted intermediary or equivalent cryptographic protocol to reduce transaction costs. In particular, we propose a mechanism through which each party provides relevant private information to an intermediary who ultimately reveals only the portfolio contents and price paid, and only to the seller and winning broker. Through analysis of a game-theoretic model, we demonstrate substantial potential benefits to sellers. For example, under reasonable assumptions a seller can reduce expected transaction costs by as much as 10% under a broker valuation model that induces efficient allocations at equilibrium, and as much as 33% under one that does not. We also consider the effects of intermediation on broker payoffs and social welfare as well as its performance in the context of the second-price auction. In addition, we consider the two-stage game in which sellers have the choice of selecting intermediation or not and show that under certain reasonable modeling assumptions that all sellers will select intermediation at equilibrium.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2011 P | In-library use |
20. Multi-camera vision for smart environments [electronic resource] [2011] Online
- Book
- 1 online resource.
Technology is blending into every part of our lives. Instead of having them as intruders in the environment, people prefer the computers to go into the background, and automatically help us at the right time, on the right thing, and in the right way. A smart environment system senses the environment and people in it, makes deductions, and then takes actions if necessary. Sensors are the "eyes" of smart environments. In this dissertation we consider vision sensors. Image and video data contain rich information, yet they are also challenging to interpret. From obtaining the raw image data from the camera sensors to achieving the application's goal in a smart environment, we need to consider three main components of the system, i.e., the hardware platform, vision analysis of the image data, and high-level reasoning pertaining to the end application. A generic vision-related problem, e.g., human pose estimation, can be approached based on different assumptions. In our approach the system constraints and application's high-level objective are considered and often define boundary conditions in the design of vision-based algorithms. A multi-camera setup requires distributed processing at each camera node and information sharing through the camera network. Therefore, computation capacity of camera nodes and the communication bandwidth can define two hard constraints for algorithm design. We first introduce a smart camera architecture and its specific local computation power and wireless communication constraints. We then examine how the problem of human pose detection can be formulated for implementation on this smart camera, and describe the steps taken to achieve real-time operation for an avatar-based gaming application. We then present a human activity analysis technique based on multi-camera fusion. The method defined a hierarchy of coarse and fine level activity classes and employs different visual features to detect the pose or activity in each class. The camera fusion methods studied in this dissertation include decision fusion and feature fusion. We show the results of experiments in three testbed environments and analyze the performance of the different fusion methods. Although computer vision already involves complicated techniques in interpreting the image data, it still constitutes just the sensing part of the smart environment system, and further application-related high-level reasoning is needed. Modeling such high-level reasoning will depend on the specific nature of the problem. We present a case study in this dissertation to demonstrate how the vision processing and high-level reasoning modules can be interfaced for making semantic inference based on observations. The case study aims to recognize objects in a smart home based on the observed user interactions. We make use of the relationship between objects and user activities to infer the objects when related activities are observed. We apply Markov logic network (MLN) to model such relationships. MLN enables intuitive modeling of relationships, and it also offers the power of graphical models. We show different ways of constructing the knowledge base in MLN, and provide experimental results.
Technology is blending into every part of our lives. Instead of having them as intruders in the environment, people prefer the computers to go into the background, and automatically help us at the right time, on the right thing, and in the right way. A smart environment system senses the environment and people in it, makes deductions, and then takes actions if necessary. Sensors are the "eyes" of smart environments. In this dissertation we consider vision sensors. Image and video data contain rich information, yet they are also challenging to interpret. From obtaining the raw image data from the camera sensors to achieving the application's goal in a smart environment, we need to consider three main components of the system, i.e., the hardware platform, vision analysis of the image data, and high-level reasoning pertaining to the end application. A generic vision-related problem, e.g., human pose estimation, can be approached based on different assumptions. In our approach the system constraints and application's high-level objective are considered and often define boundary conditions in the design of vision-based algorithms. A multi-camera setup requires distributed processing at each camera node and information sharing through the camera network. Therefore, computation capacity of camera nodes and the communication bandwidth can define two hard constraints for algorithm design. We first introduce a smart camera architecture and its specific local computation power and wireless communication constraints. We then examine how the problem of human pose detection can be formulated for implementation on this smart camera, and describe the steps taken to achieve real-time operation for an avatar-based gaming application. We then present a human activity analysis technique based on multi-camera fusion. The method defined a hierarchy of coarse and fine level activity classes and employs different visual features to detect the pose or activity in each class. The camera fusion methods studied in this dissertation include decision fusion and feature fusion. We show the results of experiments in three testbed environments and analyze the performance of the different fusion methods. Although computer vision already involves complicated techniques in interpreting the image data, it still constitutes just the sensing part of the smart environment system, and further application-related high-level reasoning is needed. Modeling such high-level reasoning will depend on the specific nature of the problem. We present a case study in this dissertation to demonstrate how the vision processing and high-level reasoning modules can be interfaced for making semantic inference based on observations. The case study aims to recognize objects in a smart home based on the observed user interactions. We make use of the relationship between objects and user activities to infer the objects when related activities are observed. We apply Markov logic network (MLN) to model such relationships. MLN enables intuitive modeling of relationships, and it also offers the power of graphical models. We show different ways of constructing the knowledge base in MLN, and provide experimental results.
Special Collections
Special Collections | Status |
---|---|
University Archives | Request on-site access |
3781 2011 W | In-library use |
Looking for different results?
Modify your search: Search all fields
Search elsewhere: Search WorldCat Search library website