In the recent past music has become ubiquitous as digital data. The scale of music collections in some online music services surpasses ten million tracks. This significant growth and resulting changes in the music content industry pose challenges in terms of efficient and effective content search, retrieval and organization. The most common approach to these needs involves the use of text-based metadata or user data. However, limitations of these methods, such as popularity bias, have prompted research in content-based methods that use audio data directly. The content-based methods are generally composed of two processing modules--extracting features from audio and training a system using the features and ground truth. The audio features, the main interest of this thesis, are conventionally designed in a highly engineered manner based on acoustic knowledge, such as in mel-frequency cepstral coefficients (MFCCs) or chroma. As an alternative approach, there is increasing interest in learning features automatically from data without relying on domain knowledge or manual refinement. This feature representation approach has been studied primarily in the areas of computer vision or speech recognition. In this thesis, we investigate the learning-based feature representation with applications to content-based music information retrieval. Specifically, we suggest a data processing pipeline to effectively learn short-term acoustic dependencies from musical signals and build a song-level feature for music genre classification and music annotation/retrieval. While visualizing the learned acoustics patterns, we will attempt to interpret how they are associated with high-level musical semantics such as genre, emotion or song quality. Through a detailed analysis, we will show the effect of individual processing units in the pipeline and meta parameters of learning algorithms on performance. In addition to these tasks, we also examine the feature learning approach for classification-based piano transcriptions. Throughout experiments on popularly used datasets, we will show that the learned feature representations achieve results comparable to state-of-the-art algorithms or outperform them.