Plos Computational Biology [PLoS Comput Biol] 2018 Jun 21; Vol. 14 (6), pp. e1006176. Date of Electronic Publication: 20180621 (Print Publication: 2018).
Algorithms, Computer Simulation, Learning, Machine Learning, Nucleic Acid Conformation, Problem Solving, RNA Folding physiology, Computer-Aided Design instrumentation, and RNA chemistry
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.
Author(s): Relly Brandman 1 , Yigal Brandman 2 , Vijay S. Pande 2 , * Introduction A primary cause of coevolution between residues is biophysical interactions in the corresponding folded [...] Coevolving residues in a multiple sequence alignment provide evolutionary clues of biophysical interactions in 3D structure. Despite a rich literature describing amino acid coevolution within or between proteins and nucleic acid coevolution within RNA, to date there has been no direct evidence of coevolution between protein and RNA. The ribosome, a structurally conserved macromolecular machine composed of over 50 interacting protein and RNA chains, provides a natural example of RNA/protein interactions that likely coevolved. We provide the first direct evidence of RNA/protein coevolution by characterizing the mutual information in residue triplets from a multiple sequence alignment of ribosomal protein L22 and neighboring 23S RNA. We define residue triplets as three positions in the multiple sequence alignment, where one position is from the 23S RNA and two positions are from the L22 protein. We show that residue triplets with high mutual information are more likely than residue doublets to be proximal in 3D space. Some high mutual information residue triplets cluster in a connected series across the L22 protein structure, similar to patterns seen in protein coevolution. We also describe RNA nucleotides for which switching from one nucleotide to another (or between purines and pyrimidines) results in a change in amino acid distribution for proximal amino acid positions. Multiple crystal structures for evolutionarily distinct ribosome species can provide structural evidence for these differences. For one residue triplet, a pyrimidine in one species is a purine in another, and RNA/protein hydrogen bonds are present in one species but not the other. The results provide the first direct evidence of RNA/protein coevolution by using higher order mutual information, suggesting that biophysical constraints on interacting RNA and protein chains are indeed a driving force in their evolution.
Yao, Yuan, Sun, Jian, Huang, Xuhui, Bowman, Gregory R., Singh, Gurjeet, Lesnick, Michael, Guibas, Leonidas J., Pande, Vijay S., and Carlsson, Gunnar
Journal of Chemical Physics; 4/14/2009, Vol. 130 Issue 14, p144115, 10p, 4 Diagrams, 2 Graphs
BIOMOLECULES, PROTEIN folding, CLUSTERING of particles, COMPUTER simulation, RNA, INTERMEDIATE state (Superconductors), MORSE theory, and TOPOLOGICAL dynamics
Characterization of transient intermediate or transition states is crucial for the description of biomolecular folding pathways, which is, however, difficult in both experiments and computer simulations. Such transient states are typically of low population in simulation samples. Even for simple systems such as RNA hairpins, recently there are mounting debates over the existence of multiple intermediate states. In this paper, we develop a computational approach to explore the relatively low populated transition or intermediate states in biomolecular folding pathways, based on a topological data analysis tool, MAPPER, with simulation data from large-scale distributed computing. The method is inspired by the classical Morse theory in mathematics which characterizes the topology of high-dimensional shapes via some functional level sets. In this paper we exploit a conditional density filter which enables us to focus on the structures on pathways, followed by clustering analysis on its level sets, which helps separate low populated intermediates from high populated folded/unfolded structures. A successful application of this method is given on a motivating example, a RNA hairpin with GCAA tetraloop, where we are able to provide structural evidence from computer simulations on the multiple intermediate states and exhibit different pictures about unfolding and refolding pathways. The method is effective in dealing with high degree of heterogeneity in distribution, capturing structural features in multiple pathways, and being less sensitive to the distance metric than nonlinear dimensionality reduction or geometric embedding methods. The methodology described in this paper admits various implementations or extensions to incorporate more information and adapt to different settings, which thus provides a systematic tool to explore the low-density intermediate states in complex biomolecular folding systems. [ABSTRACT FROM AUTHOR]
Simulating biologically relevant timescales at atomic resolution is a challenging task since typical atomistic simulations are at least two orders of magnitude shorter. Markov State Models (MSMs) provide one means of overcoming this gap without sacrificing atomic resolution by extracting long time dynamics from short simulations. MSMs coarse grain space by dividing conformational space into long-lived, or metastable, states. This is equivalent to coarse graining time by integrating out fast motions within metastable states. By varying the degree of coarse graining one can vary the resolution of an MSM; therefore, MSMs are inherently multi-resolution. Here we introduce a new algorithm Super-level-set Hierarchical Clustering (SHC), to our knowledge, the first algorithm focused on constructing MSMs at multiple resolutions. The key insight of this algorithm is to generate a set of super levels covering different density regions of phase space, then cluster each super level separately, and finally recombine this information into a single MSM. SHC is able to produce MSMs at different resolutions using different super density level sets. To demonstrate the power of this algorithm we apply it to a small RNA hairpin, generating MSMs at four different resolutions. We validate these MSMs by showing that they are able to reproduce the original simulation data. Furthermore, long time folding dynamics are extracted from these models. The results show that there are no metastable on-pathway intermediate states. Instead, the folded state serves as a hub directly connected to multiple unfolded/misfolded states which are separated from each other by large free energy barriers.
Structured RNAs must fold into their native structures and discriminate against a large number of alternative ones, an especially difficult task given the limited information content of RNA's nucleotide alphabet. The simplest motifs within structured RNAs are two helices joined by nonhelical junctions. To uncover the fundamental behavior of these motifs and to elucidate the underlying physical forces and challenges faced by structured RNAs, we computationally and experimentally studied a tethered duplex model system composed of two helices joined by flexible single- or double-stranded polyethylene glycol tethers, whose lengths correspond to those typically observed in junctions from structured RNAs. To dissect the thermodynamic properties of these simple motifs, we computationally probed how junction topology, electrostatics, and tertiary contact location influenced folding stability. Small-angle X-ray scattering was used to assess our predictions. Single- or double-stranded junctions, independent of sequence, greatly reduce the space of allowed helical conformations and influencing the preferred location and orientation of their adjoining helices. A double-stranded junction guides the helices along a hinge-like pathway. In contrast, a single-stranded junction samples a broader set of conformations and has different preferences than the double-stranded junction. In turn, these preferences determine the stability and distinct specificities of tertiary structure formation. These sequence-independent effects suggest that properties as simple as a junction's topology can generally define the accessible conformational space, thereby stabilizing desired structures and assisting in discriminating against misfolded structures. Thus, junction topology provides a fundamental strategy for transcending the limitations imposed by the low information content of RNA primary sequence.
Part of understanding a molecule's conformational dynamics is mapping out the dominant metastable, or long lived, states that it occupies. Once identified, the rates for transitioning between these states may then be determined in order to create a complete model of the system's conformational dynamics. Here we describe the use of the MSMBuilder package (now available at http://simtk.org/home/msmbuilder/) to build Markov State Models (MSMs) to identify the metastable states from Generalized Ensemble (GE) simulations, as well as other simulation datasets. Besides building MSMs, the code also includes tools for model evaluation and visualization.
Bowman GR, Huang X, Yao Y, Sun J, Carlsson G, Guibas LJ, and Pande VS
Journal Of The American Chemical Society [J Am Chem Soc] 2008 Jul 30; Vol. 130 (30), pp. 9676-8. Date of Electronic Publication: 2008 Jul 01.
Algorithms, Computer Simulation, Models, Molecular, Nuclear Magnetic Resonance, Biomolecular methods, Thermodynamics, Nucleic Acid Conformation, and RNA chemistry
Hairpins are a ubiquitous secondary structure motif in RNA molecules. Despite their simple structure, there is some debate over whether they fold in a two-state or multi-state manner. We have studied the folding of a small tetraloop hairpin using a serial version of replica exchange molecular dynamics on a distributed computing environment. On the basis of these simulations, we have identified a number of intermediates that are consistent with experimental results. We also find that folding is not simply the reverse of high-temperature unfolding and suggest that this may be a general feature of biomolecular folding.
Sorin EJ, Engelhardt MA, Herschlag D, and Pande VS
Journal Of Molecular Biology [J Mol Biol] 2002 Apr 05; Vol. 317 (4), pp. 493-506.
Base Sequence, Hydrogen Bonding, Kinetics, Magnetic Resonance Spectroscopy, Models, Molecular, RNA genetics, RNA Stability, Solvents, Stochastic Processes, Temperature, Thermodynamics, Computer Simulation, Nucleic Acid Conformation, RNA chemistry, and RNA metabolism
Simulations of an RNA hairpin containing a GNRA tetraloop were conducted to allow the characterization of its secondary structure formation and dynamics. Ten 10 ns trajectories of the folded hairpin 5'-GGGC[GCAA]GCCU-3' were generated using stochastic dynamics and the GB/SA implicit solvent model at 300 K. Overall, we find the stem to be a very stable subunit of this molecule, whereas multiple loop conformations and transitions between them were observed. These trajectories strongly suggest that extension of the C6 base away from the loop occurs cooperatively with an N-type-->S-type sugar pucker conversion in that residue and that similar pucker transitions are necessary to stabilize other looped-out bases. In addition, a short-lived conformer with an extended fourth loop residue (A8) lacking this stabilizing 2'-endo pucker mode was observed. Results of thermal perturbation at 400 K support this model of loop dynamics. Unfolding trajectories were produced using this same methodology at temperatures of 500 to 700 K. The observed unfolding events display three-state behavior kinetically (including native, globular, and unfolded populations) and, based on these observations, we propose a folding mechanism that consists of three distinct events: (i) collapse of the random unfolded structure and sampling of the globular state; (ii) passage into the folded region of configurational space as stem base-pairs form and gain helicity; and (iii) attainment of proper loop geometry and organization of loop pairing and stacking interactions. These results are considered in the context of current experimental knowledge of this and similar nucleic acid hairpins. (Copyright 2002 Elsevier Science Ltd.)
Brandman, Relly, Brandman, Yigal, and Pande, Vijay S.
PLoS ONE; Jan2012, Vol. 7 Issue 1, p1-8, 8p
RIBOSOMES, MACROMOLECULES, MOLECULAR dynamics, RNA, PROTEINS, and ENTROPY
The ribosome is a large macromolecular machine, and correlated motion between residues is necessary for coordinating function across multiple protein and RNA chains. We ran two all-atom, explicit solvent molecular dynamics simulations of the bacterial ribosome and calculated correlated motion between residue pairs by using mutual information. Because of the short timescales of our simulation (ns), we expect that dynamics are largely local fluctuations around the crystal structure. We hypothesize that residues that show coupled dynamics are functionally related, even on longer timescales. We validate our model by showing that crystallographic B-factors correlate well with the entropy calculated as part of our mutual information calculations. We reveal that A-site residues move relatively independently from P-site residues, effectively insulating A-site functions from P-site functions during translation. [ABSTRACT FROM AUTHOR]
Russell, Rick, Millett, Ian S., Tate, Mark W., Kwok, Lisa W., Nakatani, Bradley, Gruner, Sol M., Mochrie, Simon G.J., Pande, Vijay, Doniach, Sebastian, Herschlag, Daniel, and Pollack, Lois
Proceedings of the National Academy of Sciences of the United States of America; 4/2/2002, Vol. 99 Issue 7, p4266, 6p, 4 Diagrams, 3 Graphs
COMPACTING, RNA, and X-ray scattering
Examines the occurrence of rapid compaction during RNA folding. Use of x-ray scattering and computer simulations in the determining the folding process of Tetrahymena; Formation of molten globule intermediates; General feature of protein folding.
NUCLEIC acids, RNA, PROTEIN folding, PROTEIN conformation, TRANSFER RNA, and STATICS
Recent studies in protein folding suggest that native state topology plays a dominant role in determining the folding mechanism, yet an analogous statement has not been made for RNA, most likely due to the strong coupling between the ionic environment and conformational energetics that make RNA folding more complex than protein folding. Applying a distributed computing architecture to sample nearly 5000 complete tRNA folding events using a minimalist, atomistic model, we have characterized the role of native topology in tRNA folding dynamics: the simulated bulk folding behavior predicts well the experimentally observed folding mechanism. In contrast, single-molecule folding events display multiple discrete folding transitions and compose a largely diverse, heterogeneous dynamic ensemble. This both supports an emerging view of heterogeneous folding dynamics at the microscopic level and highlights the need for single-molecule experiments and both single-molecule and bulk simulations in interpreting bulk experimental measurements. [Copyright &y& Elsevier]