articles+ search results
30,779 articles+ results
1  20
Next
Number of results to display per page
1. NUQSGD: Improved Communication Efficiency for Dataparallel SGD via Nonuniform Quantization [2019]

RamezaniKebrya, Ali, Faghri, Fartash, and Roy, Daniel M.
 Subjects

Computer Science  Machine Learning and Statistics  Machine Learning
 Abstract

As the size and complexity of models and datasets grow, so does the need for communicationefficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel. Alistarh et al. (2017) describe two variants of dataparallel SGD that quantize and encode gradients to lessen communication costs. For the first variant, QSGD, they provide strong theoretical guarantees. For the second variant, which we call QSGDinf, they demonstrate impressive empirical gains for distributed training of large neural networks. Building on their work, we propose an alternative scheme for quantizing gradients and show that it yields stronger theoretical guarantees than exist for QSGD while matching the empirical performance of QSGDinf.
Comment: 21 pages, 6 figures
 Full text View this record from Arxiv

Li, Zhizhong, Luo, Linjie, Tulyakov, Sergey, Dai, Qieyun, and Hoiem, Derek
 Subjects

Computer Science  Computer Vision and Pattern Recognition and Statistics  Machine Learning
 Abstract

We introduce a novel domain adaptation formulation from synthetic dataset (source domain) to real dataset (target domain) for the category of tasks with perpixel predictions. The annotations of these tasks are relatively hard to acquire in the real world, such as singleview depth estimation or surface normal estimation. Our key idea is to introduce anchor tasks, whose annotations are (1) less expensive to acquire than the main task, such as facial landmarks and semantic segmentations; and (2) shared in availability for both synthetic and real datasets so that it serves as "anchor" between tasks; and finally (3) aligned spatially with main task annotations on a perpixel basis so that it also serves as spatial anchor between tasks' outputs. To further utilize spatial alignment between the anchor and main tasks, we introduce a novel freeze approach that freezes the final layers of our network after training on the source domain so that spatial and contextual relationship between tasks are maintained when adapting on the target domain. We evaluate our methods on two pairs of datasets, performing surface normal estimation in indoor scenes and faces, using semantic segmentation and facial landmarks as anchor tasks separately. We show the importance of using anchor tasks in both synthetic and real domains, and that the freeze approach outperforms competing approaches, reaching results in facial images on par with the stateoftheart system that leverages detailed facial appearance model.
 Full text View this record from Arxiv
3. An Exploratory Analysis of the Latent Structure of Process Data via Action Sequence Autoencoder [2019]

Tang, Xueying, Wang, Zhi, Liu, Jingchen, and Ying, Zhiliang
 Subjects

Statistics  Machine Learning, Computer Science  Machine Learning, and Statistics  Applications
 Abstract

Computer simulations have become a popular tool of assessing complex skills such as problemsolving skills. Log files of computerbased items record the entire humancomputer interactive processes for each respondent. The response processes are very diverse, noisy, and of nonstandard formats. Few generic methods have been developed for exploiting the information contained in process data. In this article, we propose a method to extract latent variables from process data. The method utilizes a sequencetosequence autoencoder to compress response processes into standard numerical vectors. It does not require prior knowledge of the specific items and humancomputers interaction patterns. The proposed method is applied to both simulated and real process data to demonstrate that the resulting latent variables extract useful information from the response processes.
Comment: 28 pages, 13 figures
 Full text View this record from Arxiv

Liu, Daniel, Yu, Ronald, and Su, Hao
 Subjects

Computer Science  Computer Vision and Pattern Recognition, Computer Science  Cryptography and Security, Computer Science  Machine Learning, Electrical Engineering and Systems Science  Image and Video Processing, and Statistics  Machine Learning
 Abstract

The importance of training robust neural network grows as 3D data is increasingly utilized in deep learning for vision tasks, like autonomous driving. We examine this problem from the perspective of the attacker, which is necessary in understanding how neural networks can be exploited, and thus defended. More specifically, we propose adversarial attacks based on solving different optimization problems, like minimizing the perceptibility of our generated adversarial examples, or maintaining a uniform density distribution of points across the adversarial object surfaces. Our four proposed algorithms for attacking 3D point cloud classification are all highly successful on existing neural networks, and we find that some of them are even effective against previously proposed point removal defenses.
Comment: 17 pages, source code available at this https url: https://github.com/DanielLiuc0deb0t/Adversarialpointperturbationson3Dobjects
 Full text View this record from Arxiv

Sheriff, Mohammed Rayyan and Chatterjee, Debasish
 Subjects

Computer Science  Machine Learning, Electrical Engineering and Systems Science  Signal Processing, Mathematics  Optimization and Control, and Statistics  Machine Learning
 Abstract

In this article we expose the convex geometry of the class of coding problems that includes the likes of Basis Pursuit Denoising. We propose a novel reformulation of the coding problem as a convexconcave minmax problem. This particular reformulation not only provides a nontrivial method to update the dictionary in order to obtain better sparse representations with hard error constraints, but also gives further insights into the underlying geometry of the coding problem. Our results shed provide pointers to new ascentdescent type algorithms that could be used to solve the coding problem.
 Full text View this record from Arxiv

MorenoVera, Felipe
 Subjects

Computer Science  Machine Learning, Computer Science  Artificial Intelligence, Computer Science  Neural and Evolutionary Computing, and Statistics  Machine Learning
 Abstract

Currently, many applications in Machine Learning are based on define new models to extract more information about data, In this case Deep Reinforcement Learning with the most common application in video games like Atari, Mario, and others causes an impact in how to computers can learning by himself with only information called rewards obtained from any action. There is a lot of algorithms modeled and implemented based on Deep Recurrent QLearning proposed by DeepMind used in AlphaZero and Go. In this document, We proposed Deep Recurrent Double QLearning that is an implementation of Deep Reinforcement Learning using Double QLearning algorithms and Recurrent Networks like LSTM and DRQN.
Comment: Accepted paper on LatinxinAI Workshop colocated with the International Conference on Machine Learning (ICML) 2019
 Full text View this record from Arxiv
7. ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search [2019]

Chu, Xiangxiang, Zhang, Bo, Li, Jixiang, Li, Qingyuan, and Xu, Ruijun
 Subjects

Computer Science  Machine Learning, Computer Science  Computer Vision and Pattern Recognition, and Statistics  Machine Learning
 Abstract

Oneshot neural architecture search features fast training of a supernet in a single run. A pivotal issue for this weightsharing approach is the lacking of scalability. A simple adjustment with identity block renders a scalable supernet but it arouses unstable training, which makes the subsequent model ranking unreliable. In this paper, we introduce linearly equivalent transformation to soothe training turbulence, providing with the proof that such transformed path is identical with the original one as per representational power. The overall method is named as SCARLET (SCAlable supeRnet with Linearly Equivalent Transformation). We show through experiments that linearly equivalent transformations can indeed harmonize the supernet training. With an EfficientNetlike search space and a multiobjective reinforced evolutionary backend, it generates a series of competitive models: ScarletA achieves 76.9% Top1 accuracy on ImageNet which outperforms EfficientNetB0 by a large margin; the shallower ScarletB exemplifies the proposed scalability which attains the same accuracy 76.3% as EfficientNetB0 with much fewer FLOPs; ScarletC scores competitive 75.6% with comparable sizes. The models and evaluation code are released online https://github.com/xiaomiautoml/ScarletNAS .
 Full text View this record from Arxiv

Piotrowski, Tomasz and Rykaczewski, Krzysztof
 Subjects

Computer Science  Machine Learning, Mathematics  Functional Analysis, and Statistics  Machine Learning
 Abstract

A recent analysis of a model of iterative neural network in Hilbert spaces established fundamental properties of such networks, such as existence of the fixed points sets, convergence analysis, and Lipschitz continuity. Building on these results, we show that under a single mild condition on the weights of the network, one is guaranteed to obtain a neural network converging to its unique fixed point. We provide a bound on the norm of this fixed point in terms of norms of weights and biases of the network. We also show why this model of a feedforward neural network is not able to accomodate Hopfield networks under our assumption.
 Full text View this record from Arxiv
9. The Partial Response Network [2019]

Lisboa, Paulo J. G., OrtegaMartorell, Sandra, Cashman, Sadie, and Olier, Ivan
 Subjects

Computer Science  Machine Learning, Computer Science  Neural and Evolutionary Computing, and Statistics  Machine Learning
 Abstract

We propose a method to open the black box of the MultiLayer Perceptron by inferring from it a simpler and generally more accurate general additive model. The resulting model comprises nonlinear univariate and bivariate partial responses derived from the original MultiLayer Perceptron. The responses are combined using the Lasso and further optimised within a modular structure. The approach is generic and provides a constructive framework to simplify and explain the MultiLayer Perceptron for any data set, opening the door for validation against prior knowledge. Experimental results on benchmarking datasets indicate that the partial responses are intuitive to interpret and the Area Under the Curve is competitive with Gradient Boosting, Support Vector Machines and Random Forests. The performance improvement compared with a fully connected MultiLayer Perceptron is attributed to reduced confounding in the second stage of optimisation of the weights. The main limitation of the method is that it explicitly models only up to pairwise interactions. For many practical applications this will be optimal, but where that is not the case then this will be indicated by the performance difference compared to the original model. The streamlined model simultaneously interprets and optimises this frequently used flexible model.
Comment: 10 pages, 5 figures
 Full text View this record from Arxiv

Baker, Henrietta, Hallowell, Matthew R., and Tixier, Antoine J. P.
 Subjects

Computer Science  Machine Learning and Statistics  Machine Learning
 Abstract

This paper significantly improves on, and finishes to validate, the approach proposed in "Application of Machine Learning to Construction Injury Prediction" (Tixier et al. 2016 [1]). Like in the original study, we use NLP to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes (here, these outcomes are injury severity, injury type, bodypart impacted, and incident type). However, in this study, safety outcomes were not extracted via NLP but are independent (human annotations), eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original study. Other improvements brought by the current study include the use of (1) a much larger dataset, (2) two new models (XGBoost andlinear SVM), (3) model stacking, (4) a more straight forward experimental setup with more appropriate performance metrics, and (5) an analysis of percategory attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.
 Full text View this record from Arxiv

OrbesArteaga, Mauricio, Varsavsky, Thomas, Sudre, Carole H., EatonRosen, Zach, Haddow, Lewis J., Sørensen, Lauge, Nielsen, Mads, Pai, Akshay, Ourselin, Sébastien, Modat, Marc, Nachev, Parashkev, and Cardos, M. Jorge
 Subjects

Electrical Engineering and Systems Science  Image and Video Processing, Computer Science  Computer Vision and Pattern Recognition, Computer Science  Machine Learning, and Statistics  Machine Learning
 Abstract

Supervised learning algorithms trained on medical images will often fail to generalize across changes in acquisition parameters. Recent work in domain adaptation addresses this challenge and successfully leverages labeled data in a source domain to perform well on an unlabeled target domain. Inspired by recent work in semisupervised learning we introduce a novel method to adapt from one source domain to $n$ target domains (as long as there is paired data covering all domains). Our multidomain adaptation method utilises a consistency loss combined with adversarial learning. We provide results on white matter lesion hyperintensity segmentation from brain MRIs using the MICCAI 2017 challenge data as the source domain and two target domains. The proposed method significantly outperforms other domain adaptation baselines.
Comment: DART MICCAI whorshop 2019
 Full text View this record from Arxiv
12. N2D:(Not Too) Deep clustering via clustering the local manifold of an autoencoded embedding [2019]

McConville, Ryan, SantosRodriguez, Raul, Piechocki, Robert J, and Craddock, Ian
 Subjects

Computer Science  Machine Learning and Statistics  Machine Learning
 Abstract

Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and nonclustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network. Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold. For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network. We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is best able to find the most clusterable manifold in the embedding, suggesting local manifold learning on an autoencoded embedding is effective for discovering higher quality discovering clusters. We quantitatively show across a range of image and timeseries datasets that our method has competitive performance against the latest deep clustering algorithms, including outperforming current stateoftheart on several. We postulate that these results show a promising research direction for deep clustering.
 Full text View this record from Arxiv

Lawrence, Carolin, Kotnis, Bhushan, and Niepert, Mathias
 Subjects

Statistics  Machine Learning, Computer Science  Computation and Language, and Computer Science  Machine Learning
 Abstract

Neural sequence generation is typically performed tokenbytoken and lefttoright. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin.
Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, Hong Kong, China
 Full text View this record from Arxiv

Yao, Xin, Huang, Tianchi, Wu, Chenglei, Zhang, Ruixiao, and Sun, Lifeng
 Subjects

Computer Science  Machine Learning, Computer Science  Distributed, Parallel, and Cluster Computing, and Statistics  Machine Learning
 Abstract

Federated learning (FL) enables ondevice training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication cost and inevitable performance drop, especially when the local data is distributed in a nonIID way. To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the ondevice training. The first (FedMMD) is adopting a twostream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices. Experiments show that the proposed method outperforms baselines, especially in nonIID FL settings, with a reduction of more than 20% in required communication rounds. The second is FL with feature fusion (FedFusion). By aggregating the features from both the local and global models, we achieve higher accuracy at less communication cost. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60%.
Comment: This is a combination version of our papers in VCIP 2018 and ICIP 2019
 Full text View this record from Arxiv

Brøndum, Rasmus Froberg, Michaelsen, Thomas Yssing, and Bøgsted, Martin
 Subjects

Statistics  Machine Learning, Computer Science  Machine Learning, and Statistics  Methodology
 Abstract

Outcome regressed on class labels identified by unsupervised clustering is custom in many applications. However, it is common to ignore the misclassification of class labels caused by the learning algorithm, which potentially leads to serious bias of the estimated effect parameters. Due to its generality we suggest to redress the situation by use of the simulation and extrapolation method. Performance is illustrated by simulated data from Gaussian mixture models. Finally, we apply our method to a study which regressed overall survival on class labels derived from unsupervised clustering of gene expression data from bone marrow samples of multiple myeloma patients.
 Full text View this record from Arxiv

Dudek, Grzegorz
 Subjects

Computer Science  Machine Learning, Computer Science  Neural and Evolutionary Computing, and Statistics  Machine Learning
 Abstract

The standard method of generating random weights and biases in feedforward neural networks with random hidden nodes, selects them both from the uniform distribution over the same fixed interval. In this work, we show the drawbacks of this approach and propose a new method of generating random parameters. This method ensures the most nonlinear fragments of sigmoids, which are most useful in modeling target function nonlinearity, are kept in the input hypercube. In addition, we show how to generate activation functions with uniformly distributed slope angles.
 Full text View this record from Arxiv

Panigrahi, Abhishek, Shetty, Abhishek, and Goyal, Navin
 Subjects

Computer Science  Machine Learning and Statistics  Machine Learning
 Abstract

It is wellknown that overparametrized neural networks trained using gradientbased methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. The limiting case when the network size approaches infinity has also been considered. These results either assume that the activation function is ReLU or they crucially depend on the minimum eigenvalue of a certain Gram matrix depending on the data, random initialization and the activation function. In the latter case, existing works only prove that this minimum eigenvalue is nonzero and do not provide quantitative bounds. On the empirical side, a contemporary line of investigations has proposed a number of alternative activation functions which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of activation functions on training. In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2layer neural networks. We show that for smooth activations, such as tanh and swish, the minimum eigenvalue can be exponentially small depending on the span of the dataset implying that the training can be very slow. In contrast, for activations with a "kink," such as ReLU, SELU, ELU, all eigenvalues are large under minimal assumptions on the data. Several new ideas are involved. Finally, we corroborate our results empirically.
 Full text View this record from Arxiv
18. Safe global optimization of expensive noisy blackbox functions in the $\delta$Lipschitz framework [2019]

Sergeyev, Yaroslav D., Candelieri, Antonio, Kvasov, Dmitri E., and Perego, Riccardo
 Subjects

Mathematics  Optimization and Control, Computer Science  Machine Learning, Mathematics  Numerical Analysis, Statistics  Machine Learning, and 90C26, 65K05, 68T05, 68Q32
 Abstract

In this paper, the problem of safe global maximization (it should not be confused with robust optimization) of expensive noisy blackbox functions satisfying the Lipschitz condition is considered. The notion "safe" means that the objective function $f(x)$ during optimization should not violate a "safety" threshold, for instance, a certain a priori given value $h$ in a maximization problem. Thus, any new function evaluation must be performed at "safe points" only, namely, at points $y$ for which it is known that the objective function $f(y) > h$. The main difficulty here consists in the fact that the used optimization algorithm should ensure that the safety constraint will be satisfied at a point $y$ before evaluation of $f(y)$ will be executed. Thus, it is required both to determine the safe region $\Omega$ within the search domain~$D$ and to find the global maximum within $\Omega$. An additional difficulty consists in the fact that these problems should be solved in the presence of the noise. This paper starts with a theoretical study of the problem and it is shown that even though the objective function $f(x)$ satisfies the Lipschitz condition, traditional Lipschitz minorants and majorants cannot be used due to the presence of the noise. Then, a $\delta$Lipschitz framework and two algorithms using it are proposed to solve the safe global maximization problem. The first method determines the safe area within the search domain and the second one executes the global maximization over the found safe region. For both methods a number of theoretical results related to their functioning and convergence is established. Finally, numerical experiments confirming the reliability of the proposed procedures are performed.
Comment: Submitted paper (35 pages, 44 figures, 4 tables): Yaroslav D. Sergeyev  corresponding author
 Full text View this record from Arxiv
19. Pearson Distance is not a Distance [2019]

Solo, Victor
 Subjects

Statistics  Methodology and Statistics  Machine Learning
 Abstract

The Pearson distance between a pair of random variables $X,Y$ with correlation $\rho_{xy}$, namely, 1$\rho_{xy}$, has gained widespread use, particularly for clustering, in areas such as gene expression analysis, brain imaging and cyber security. In all these applications it is implicitly assumed/required that the distance measures be metrics, thus satisfying the triangle inequality. We show however, that Pearson distance is not a metric. We go on to show that this can be repaired by recalling the result, (well known in other literature) that $\sqrt{1\rho_{xy}}$ is a metric. We similarly show that a related measure of interest, $1\rho_{xy}$, which is invariant to the sign of $\rho_{xy}$, is not a metric but that $\sqrt{1\rho_{xy}^2}$ is. We also give generalizations of these results.
 Full text View this record from Arxiv
20. Modelbased Lookahead Reinforcement Learning [2019]

Hong, ZhangWei, Pajarinen, Joni, and Peters, Jan
 Subjects

Computer Science  Machine Learning, Computer Science  Artificial Intelligence, and Statistics  Machine Learning
 Abstract

Modelbased Reinforcement Learning (MBRL) allows dataefficient learning which is required in real world applications such as robotics. However, despite the impressive dataefficiency, MBRL does not achieve the final performance of stateoftheart Modelfree Reinforcement Learning (MFRL) methods. We leverage the strengths of both realms and propose an approach that obtains high performance with a small amount of data. In particular, we combine MFRL and Model Predictive Control (MPC). While MFRL's strength in exploration allows us to train a better forward dynamics model for MPC, MPC improves the performance of the MFRL policy by samplingbased planning. The experimental results in standard continuous control benchmarks show that our approach can achieve MFRL`s level of performance while being as dataefficient as MBRL.
 Full text View this record from Arxiv
Catalog
Books, media, physical & digital resources
 Catalog results include
1  20
Next