Glycoconjugates constitute a major class of biomolecules which include glycoproteins, glycosphingolipids and proteoglycans. The enzymatic process in which glycans (sugar chains) are linked to proteins or lipids is called glycosylation. Glycosylation is involved in many biological processes, both physiological and pathological, inlcuding host-pathogen interactions, tumour invasion, cell trafficking and signalling. Changes in glycan structure are thought be be at least partly responsible for the development of inflammation, infection, arteriosclerosis, immune defects and autoimmunity. Such changes have been observed in human diseases such as diabetes mellitus, rheumatoid arthritis and Alzheimer’s Disease. Aberrant patterns of glycosylation are also a universal feature of cancer cells. The field of glycobiology thus shows great potential for the discovery of glycan biomarkers for disease diagnosis and prognosis. Here we focus specifically on N-glycans, that is, glycans attached to protein molecules via a nitrogen atom. This class of glycans is the best characterized. High-throughput HILIC analysis is a well-established technique for the separation and quantification of N-linked glycans released from glycoproteins. HILIC analysis quantifies the N-glycan structures in serum via a chromatogram, which is subsequently standardized and integrated. The generated data for each sample is a set of relative HILIC peak areas and as a result, the data is compositional. To-date, most statistical analyses of these glycan data fail to account for their compositional nature. We compare and contrast three compositional data models for the glycan HILIC data: the Dirichlet, Nested Dirichlet and Logistic Normal models, with the intention of providing tools for the statistical analysis of compositional data analysis in the glycobiology field. We use these three models for classification of disease/control cases in ovarian and lung cancer diagnosis applications. We discuss and compare these models in terms of their classification performance and goodness-of-fit
Intramuscular fat (IMF) content and composition, particularly the oleic fatty acid content (OL), are major quality characteristics of pork fresh and dry-cured products. They are known to be related to nutritional, manufacturing and organoleptic properties, as well as to human health. It is known that IMF content is under genetic control but little evidence is available for IMF composition, namely OL. There are very few estimates in the literature regarding genetic parameters for OL (Suzuki et al., 2006) and, besides, most of them are based on small data sets from experiments designed for other purposes (Ntawubizi et al., 2010; Sellier et al., 2010). However, genetic parameters associated to IMF and OL (i.e. heritability and genetic correlations with other relevant traits) are needed for developing selection criteria and optimum breeding strategies and programmes. IMF content is usually expressed in percent of dry or wet matter and OL in percent of total fatty acids in IMF. However, all research done in this field was not aware of the compositional nature of these data (Aitchison, 1986). The purpose of the present contribution is to compare results from standard linear with compositional data analyses for IMF and OL. Analyses were compared in terms of genetic parameter estimates, selection efficiency, and predictive capacity.
Brazil is the largest orange (Citrus sinensis) producer worldwide. The nutrient management of orange orchards is designed from experiments on a limited number of varieties. This knowledge is transferred to other varieties by diagnosing tissue nutrient composition. Nutrient diagnostic tools are based on nutrient concentration (critical minimum value or CMV) and ratio (Diagnosis and Recommendation Integrated System or DRIS) norms that disregard the compositional nature of analytical data and the limited number of nutrient ratios that can be diagnosed independently in a given composition. The diagnosis of cationic micronutrients is also biased by contamination from fungicides. Compositional data analysis that can avoid such problems has been first applied to tissue analysis of agricultural crops using centered log ratios (Compositional Nutrient Diagnosis – CND-clr). The isometric log ratio (ilr) transformation is a new approach based on binary nutrient ratios and the principle of orthogonality (CND-ilr). Binary partitions can be defined and varietal nutrient profiles classified based on positive and negative nutrient interactions and meta-analysis. We analyzed 11 nutrients (N, S, P, K, Ca, Mg, B, Cu, Zn, Mn, Fe) in tissue samples across 108 orchard areas, i.e. 31 ‘Valencia’, 22 ‘Hamlin’, 20 ‘Pêra’, and 35 ‘Natal’. Nutrients were partitioned between macro- and micro-nutrients as well as anionic and cationic species. The effect size of varieties over ‘Valencia’ was quantified by the mean and standard deviation of ilr values across ilr coordinates. Specific varietal nutrient profiles and ilr norms were defined. To guide correcting nutrient deficiencies by appropriate nutrient management, compositions can be varied by a perturbation vector on nutrients with to the largest and most negative influence on ilr differences from ilr norms until the Aitchison distance falls below critical value