Multi-scale data integration frameworks for predicting outcomes in cancer [electronic resource]
- Haruka Itakura.
- Physical description
- 1 online resource.
- Itakura, Haruka.
- Gevaert, Olivier Michel Simonne, primary advisor.
- Altman, Russ, advisor.
- Mitchell, Beverly S., advisor.
- Stanford University. Program in Biomedical Informatics.
- Cancer research abounds with multi-scale data, from imaging to multi-modal molecular data, such as genomic, epigenomic, transcriptomic, and proteomic. Prediction models of clinical outcomes, including survival and therapeutic response, could capitalize on the richness of information that the data embody. In practice, however, the lack of effective methods for data integrative analysis leaves much of the latent knowledge untapped. For example, imaging data are routinely obtained for diagnostic purposes, but often underutilized in integrative analysis of cancer outcomes. By establishing inter-data correlations, imaging data have the potential to become noninvasive proxies for biopsy-acquired molecular data. Furthermore, traditional methods of data analysis have limited ability to extract knowledge from multi-scale data, which are large, heterogeneous, and exhibit complex inter-data interactions. Yet, in practice, most data integration efforts embrace outmoded methods, which limit analytic capabilities to a small number of datasets and do not accommodate different data types. In this dissertation, I outline specific approaches to enhance knowledge extraction through integrative analyses that: (1) directly relate imaging data to molecular data, and (2) provide biomedical decision support (prediction of clinical outcomes) from multi-scale data. I applied these approaches, embodied in two frameworks, to the analysis of brain cancers: glioblastoma (GBM) and low grade glioma (LGG). The frameworks were designed to improve upon current standards of data analysis and enhance extraction of knowledge by incorporating machine learning techniques to boost information capture from each data source, adapting dedicated strategies for the integration of multiple high-dimensional datasets, and including rigorous evaluation and validation strategies to ensure robust performance. Using the first framework, I identified three novel image-based GBM subtypes and showed that each subtype not only confers differential survival probabilities, but also embodies a unique set of differentially regulated signaling pathways, which could potentially be targeted for therapeutic effect. The use of imaging data to infer molecular information, including potential therapies, without biopsy supports the role of quantitative image features as noninvasive surrogates of underlying molecular activity. Using the second framework, I developed prediction models of survival built on specific strategies of integrating multi-omics data that outperformed models built on current standards of analysis. It generated predictive markers of survival, a mix of previously unknown and known molecular entities with active or putative oncogenic roles, in GBM and LGG. The application of this framework in other cancers is anticipated to facilitate both novel biomarker discovery and biomedical decision support for a variety of clinical outcomes, including treatment response and risk of recurrence.
- Publication date
- Submitted to the Program in Biomedical Informatics.
- Thesis (Ph.D.)--Stanford University, 2016.