Fogolari Federico, Corazza Alessandra, Viglino Paolo, and Esposito Gennaro
Algorithms for Molecular Biology, Vol 7, Iss 1, p 16 (2012)
Subjects
Biology (General), QH301-705.5, Genetics, and QH426-470
Abstract
Abstract Background For many predictive applications a large number of models is generated and later clustered in subsets based on structure similarity. In most clustering algorithms an all-vs-all root mean square deviation (RMSD) comparison is performed. Most of the time is typically spent on comparison of non-similar structures. For sets with more than, say, 10,000 models this procedure is very time-consuming and alternative faster algorithms, restricting comparisons only to most similar structures would be useful. Results We exploit the inverse triangle inequality on the RMSD between two structures given the RMSDs with a third structure. The lower bound on RMSD may be used, when restricting the search of similarity to a reasonably low RMSD threshold value, to speed up similarity searches significantly. Tests are performed on large sets of decoys which are widely used as test cases for predictive methods, with a speed-up of up to 100 times with respect to all-vs-all comparison depending on the set and parameters used. Sample applications are shown. Conclusions The algorithm presented here allows fast comparison of large data sets of structures with limited memory requirements. As an example of application we present clustering of more than 100000 fragments of length 5 from the top500H dataset into few hundred representative fragments. A more realistic scenario is provided by the search of similarity within the very large decoy sets used for the tests. Other applications regard filtering nearly-indentical conformation in selected CASP9 datasets and clustering molecular dynamics snapshots. Availability A linux executable and a Perl script with examples are given in the supplementary material (Additional file 1). The source code is available upon request from the authors.
Tosatto Silvio CE, Fogolari Federico, and Colombo Giorgio
BMC Bioinformatics, Vol 6, Iss 1, p 301 (2005)
Subjects
Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), and QH301-705.5
Abstract
Abstract Background Estimators of free energies are routinely used to judge the quality of protein structural models. As these estimators still present inaccuracies, they are frequently evaluated by discriminating native or native-like conformations from large ensembles of so-called decoy structures. Results A decoy set is obtained from snapshots taken from 5 long (100 ns) molecular dynamics (MD) simulations of the thermostable subdomain from chicken villin headpiece. An evaluation of the energy of the decoys is given using: i) a residue based contact potential supplemented by a term for the quality of dihedral angles; ii) a recently introduced combination of four statistical scoring functions for model quality estimation (FRST); iii) molecular mechanics with solvation energy estimated either according to the generalized Born surface area (GBSA) or iv) the Poisson-Boltzmann surface area (PBSA) method. Conclusion The decoy set presented here has the following features which make it attractive for testing energy scoring functions: 1) it covers a broad range of RMSD values (from less than 2.0 Å to more than 12 Å); 2) it has been obtained from molecular dynamics trajectories, starting from different non-native-like conformations which have diverse behaviour, with secondary structure elements correctly or incorrectly formed, and in one case folding to a native-like structure. This allows not only for scoring of static structures, but also for studying, using free energy estimators, the kinetics of folding; 3) all structures have been obtained from accurate MD simulations in explicit solvent and after molecular mechanics (MM) energy minimization using an implicit solvent method. The quality of the covalent structure therefore does not suffer from steric or covalent problems. The statistical and physical effective energy functions tested on the set behave differently when native simulation snapshots are included or not in the set and when averaging over the trajectory is performed.
Fogolari Federico, Dovier Agostino, and Dal Palù Alessandro
BMC Bioinformatics, Vol 5, Iss 1, p 186 (2004)
Subjects
Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), and QH301-705.5
Abstract
Abstract Background The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Results Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. Conclusions The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.
Fogolari Federico, Molinari Henriette, and Berrera Marco
BMC Bioinformatics, Vol 4, Iss 1, p 8 (2003)
Subjects
Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), and QH301-705.5
Abstract
Abstract Background Contradicting evidence has been presented in the literature concerning the effectiveness of empirical contact energies for fold recognition. Empirical contact energies are calculated on the basis of information available from selected protein structures, with respect to a defined reference state, according to the quasi-chemical approximation. Protein-solvent interactions are estimated from residue solvent accessibility. Results In the approach presented here, contact energies are derived from the potential of mean force theory, several definitions of contact are examined and their performance in fold recognition is evaluated on sets of decoy structures. The best definition of contact is tested, on a more realistic scenario, on all predictions including sidechains accepted in the CASP4 experiment. In 30 out of 35 cases the native structure is correctly recognized and best predictions are usually found among the 10 lowest energy predictions. Conclusion The definition of contact based on van der Waals radii of alpha carbon and side chain heavy atoms is seen to perform better than other definitions involving only alpha carbons, only beta carbons, all heavy atoms or only backbone atoms. An important prerequisite for the applicability of the approach is that the protein structure under study should not exhibit anomalous solvent accessibility, compared to soluble proteins whose structure is deposited in the Protein Data Bank. The combined evaluation of a solvent accessibility parameter and contact energy allows for an effective gross screening of predictive models.
Protein aggregation including the formation of dimers and multimers in solution, underlies an array of human diseases such as systemic amyloidosis which is a fatal disease caused by misfolding of native globular proteins damaging the structure and function of affected organs. Different kind of interactors can interfere with the formation of protein dimers and multimers in solution. A very special class of interactors are nanoparticles thanks to the extremely efficient extension of their interaction surface. In particular citrate-coated gold nanoparticles (cit-AuNPs) were recently investigated with amyloidogenic protein $\beta$2-microglobulin ($\beta$2m). Here we present the computational studies on two challenging models known for their enhanced amyloidogenic propensity, namely $\Delta$N6 and D76N $\beta$2m naturally occurring variants, and disclose the role of cit-AuNPs on their fibrillogenesis. The proposed interaction mechanism lies in the interference of the cit-AuNPs with the protein dimers at the early stages of aggregation, that induces dimer disassembling. As a consequence, natural fibril formation can be inhibited. Relying on the comparison between atomistic simulations at multiple levels (enhanced sampling molecular dynamics and Brownian dynamics) and protein structural characterisation by NMR, we demonstrate that the cit-AuNPs interactors are able to inhibit protein dimer assembling. As a consequence, the natural fibril formation is also inhibited, as found in experiment. Comment: Published by RSC, under a Creative Commons Attribution 3.0 Unported Licence
The paper investigates a novel approach, based on Constraint Logic Programming (CLP), to predict the 3D conformation of a protein via fragments assembly. The fragments are extracted by a preprocessor-also developed for this work- from a database of known protein structures that clusters and classifies the fragments according to similarity and frequency. The problem of assembling fragments into a complete conformation is mapped to a constraint solving problem and solved using CLP. The constraint-based model uses a medium discretization degree Ca-side chain centroid protein model that offers efficiency and a good approximation for space filling. The approach adapts existing energy models to the protein representation used and applies a large neighboring search strategy. The results shows the feasibility and efficiency of the method. The declarative nature of the solution allows to include future extensions, e.g., different size fragments for better accuracy. Comment: special issue dedicated to ICLP 2010