In this paper we examine the problem of determining demonstration sufficiency for AI agents that learn from demonstrations: how can an AI agent self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk to enable agents that learn from demonstrations to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the expert's unobserved reward function, and (2) improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation and demonstrate the feasibility of developing an AI system that can accurately evaluate whether it has received sufficient training data to guarantee, with high confidence, that it can match an expert's performance or surpass the performance of a baseline policy within some desired safety threshold. Comment: Appears in Proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)"
Planning under social interactions with other agents is an essential problem for autonomous driving. As the actions of the autonomous vehicle in the interactions affect and are also affected by other agents, autonomous vehicles need to efficiently infer the reaction of the other agents. Most existing approaches formulate the problem as a generalized Nash equilibrium problem solved by optimization-based methods. However, they demand too much computational resource and easily fall into the local minimum due to the non-convexity. Monte Carlo Tree Search (MCTS) successfully tackles such issues in game-theoretic problems. However, as the interaction game tree grows exponentially, the general MCTS still requires a huge amount of iterations to reach the optima. In this paper, we introduce an efficient game-theoretic trajectory planning algorithm based on general MCTS by incorporating a prediction algorithm as a heuristic. On top of it, a social-compliant reward and a Bayesian inference algorithm are designed to generate diverse driving behaviors and identify the other driver's driving preference. Results demonstrate the effectiveness of the proposed framework with datasets containing naturalistic driving behavior in highly interactive scenarios. Comment: IEEE Robotics and Automation Letters 2022 (RA-L with IROS option)
Yoshihashi, Ryota, Kawakami, Rei, You, Shaodi, Trinh, Tu Tuan, Iida, Makoto, and Naemura, Takeshi
Subjects
Computer Science - Computer Vision and Pattern Recognition
Abstract
Detecting tiny objects in a high-resolution video is challenging because the visual information is little and unreliable. Specifically, the challenge includes very low resolution of the objects, MPEG artifacts due to compression and a large searching area with many hard negatives. Tracking is equally difficult because of the unreliable appearance, and the unreliable motion estimation. Luckily, we found that by combining this two challenging tasks together, there will be mutual benefits. Following the idea, in this paper, we present a neural network model called the Recurrent Correlational Network, where detection and tracking are jointly performed over a multi-frame representation learned through a single, trainable, and end-to-end network. The framework exploits a convolutional long short-term memory network for learning informative appearance changes for detection, while the learned representation is shared in tracking for enhancing its performance. In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements in detection performance over deep single-frame detectors and existing motion-based detectors. Furthermore, our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on a bird image dataset. Comment: arXiv admin note: text overlap with arXiv:1709.04666
Tu T. Nguyen, Pham Thanh Tung, Nguyen Ngoc Tan, Nguyen Ngoc Linh, and Trinh Tu Luc
Infrastructures, Vol 7, Iss 123, p 123 (2022)
Subjects
deep belief network, parameter analysis, ground-penetrating radar, chloride contamination, concrete structure, and Technology
Abstract
The applications of the deep belief network (DBN) for addressing practical engineering issues have recently emerged all over the world thanks to its accuracy and availability of data. In this paper, a predictive model using DBN was employed to investigate the factors that affect the ground-penetrating radar (GPR) signals from the rebar embedded in concrete structures. Four variables, namely temperature, relative humidity, chloride contamination level, and rebar surface corrosion condition were used as the model inputs for the investigation. Comprehensive data acquired from previously published documents were used to establish the proposed DBN model. It was shown that temperature and chloride contamination level variables generated significant effects on the GPR amplitude signal from rebar. In contrast, the relative humidity and rebar surface corrosion condition parameters were found to yield a minimal influence on the output of the proposed model. The DBN model can be used to predict the amplitude of GPR signals from the four inputs with a high level of accuracy. Specifically, the coefficient of determination (R2) was 0.9634 and 0.9681 for the testing dataset and the entire database, respectively.
Yoshihashi, Ryota, Trinh, Tu Tuan, Kawakami, Rei, You, Shaodi, Iida, Makoto, and Naemura, Takeshi
Subjects
Computer Science - Computer Vision and Pattern Recognition
Abstract
While generic object detection has achieved large improvements with rich feature hierarchies from deep nets, detecting small objects with poor visual cues remains challenging. Motion cues from multiple frames may be more informative for detecting such hard-to-distinguish objects in each frame. However, how to encode discriminative motion patterns, such as deformations and pose changes that characterize objects, has remained an open question. To learn them and thereby realize small object detection, we present a neural model called the Recurrent Correlational Network, where detection and tracking are jointly performed over a multi-frame representation learned through a single, trainable, and end-to-end network. A convolutional long short-term memory network is utilized for learning informative appearance change for detection, while learned representation is shared in tracking for enhancing its performance. In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements in detection performance over deep single-frame detectors and existing motion-based detectors. Furthermore, our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on the bird dataset. Comment: 10 pages, 8 figures
GROUND penetrating radar, HUMIDITY, DEEP learning, PREDICTION models, and WALLS
Abstract
The applications of the deep belief network (DBN) for addressing practical engineering issues have recently emerged all over the world thanks to its accuracy and availability of data. In this paper, a predictive model using DBN was employed to investigate the factors that affect the ground-penetrating radar (GPR) signals from the rebar embedded in concrete structures. Four variables, namely temperature, relative humidity, chloride contamination level, and rebar surface corrosion condition were used as the model inputs for the investigation. Comprehensive data acquired from previously published documents were used to establish the proposed DBN model. It was shown that temperature and chloride contamination level variables generated significant effects on the GPR amplitude signal from rebar. In contrast, the relative humidity and rebar surface corrosion condition parameters were found to yield a minimal influence on the output of the proposed model. The DBN model can be used to predict the amplitude of GPR signals from the four inputs with a high level of accuracy. Specifically, the coefficient of determination (R2) was 0.9634 and 0.9681 for the testing dataset and the entire database, respectively. [ABSTRACT FROM AUTHOR]