Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Introduction to Molecular Modeling in Rosetta, Lecture notes of Algorithms and Programming

An introduction to molecular modeling in Rosetta, covering topics such as protein structure and function, molecular energies, and simulations in PyRosetta. It explains the basic thermodynamic principle of protein folding and how Rosetta software uses it to explore many conformations of a protein given a set of predefined constraints, under an approximate free energy function, searching for the lowest-energy, and consequently biologically-relevant, conformation. The document also covers protein synthesis, protein-protein binding, and molecular forces that act on a conformation.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

laalamani
laalamani 🇺🇸

5

(1)

1 document

1 / 57

Toggle sidebar

Related documents


Partial preview of the text

Download Introduction to Molecular Modeling in Rosetta and more Lecture notes Algorithms and Programming in PDF only on Docsity! PyRosetta Interactive Molecular Modeling for Proteins User’s Manual Jeffrey J. Gray Sidhartha Chaudhury Sergey Lyskov Johns Hopkins University, Baltimore, MD 2 Table of Contents Preface Unit 1 Introduction to Molecular Modeling in Rosetta 1.1 Protein Structure and Function 1.2 Introduction to Protein Structure 1.3 Introduction to Molecular Energies 1.4 Protein Structure and Energies in Rosetta 1.5 Getting Started in PyRosetta Unit 2 Protein Structure in PyRosetta 2.1 Exploring the Pose Object 2.2 Accessing and Manipulating Protein Geometry 2.3 Centroid and Full-atom representations of the protein 2.4 Parameter files for residues Unit 3 Calculating Energies in PyRosetta 3.1 Introduction to Scoring Functions 3.2 Creating and editing scoring functions 3.3 Scoring components 3.4 Accessing more detailed scoring information Unit 4 Simple simulations in PyRosetta 4.1 Introduction to the Monte Carlo Algorithm 4.2 Using the Monte Carlo Object 4.3 A simple Monte Carlo simulation 4.4 Using the Job Distributor Unit 5 Sampling – Movers 5.1 Introduction to Movers 5.2 Limiting the search space with the MoveMap 5.3 Simple Backbone movers 5.4 Fragment Movers 5.5 Energy Minimization 5.6 Other movers: SequenceMover, TrialMover, etc.. 5 Unit 1: Introduction to Molecular Modeling in Rosetta 1.1 Protein Structure and Function Proteins are one of the five major biological macromolecules, they are responsible for a variety of biochemical processes from structure, to signaling, to catalyzing essential biochemical reactions. Proteins are polymers of amino acids that are encoded by genes. In the processes of translation, a mRNA transcript from the nucleus is used to create the protein chain in the ribosome. After translation, this amino acid polymer (also known as a polypeptide chain) adopts the lowest free-energy conformation in solution, through a process called protein folding. Most naturally occurring proteins fold into a very specific shape or structure, and their function is directly a result of this structure. A typical folding energy for a protein is -10 kcal/mol, meaning that well over 99% of the protein in solution adopts this lowest-free energy conformation. Fig 1.1.1 Protein synthesis Like protein folding, almost all protein functions occur as a result of this basic thermodynamic principle: the protein will adopt the lowest energy conformation in a given environment. In almost all cases, this lowest energy conformation is a biologically-evolved, highly specific structure. For example, in protein-protein binding, in which two proteins are free in solution, the lowest energy conformation will be a highly specific complex between the two partners. The energy of a given conformation is a function of various molecular forces that act on that conformation, including electrostatics, Van der Waals interactions, hydrogen-bonding, and solvation energies. Since its inception in 1998, the Rosetta molecular modeling software has been designed for accurate protein structure prediction and design. Fundamentally, its algorithms use this same 6 basic thermodynamic principle: it explores many conformations of a protein given a set of pre- defined constraints, under an approximate free energy function, searching for the lowest-energy, and consequently biologically-relevant, conformation. For each algorithm, from docking, to folding, to design, it is essential to understand three things: 1. What are conformational constraints to the system? 2. What is conformational sampling strategy within these predefined constraints? 3. What is the energy function/score function that is being used to identify the lowest- energy conformation? 1.2 Introduction to Protein Structure The monomers of a polypeptide chain, called residues, form a polymer through peptide bonds. The polypeptide chain itself (hereafter referred to as the backbone) is made up entirely of an N-Cα bond, a Cα-C bond, and a C-N bond, repeated for each amino acid. The structure of this polymer can be described in two ways: first, as a set of Cartesian coordinates (x, y, z) that describe the position of every atom in the polymer, and second, as a internal coordinates, as a set of torsion angle values along all bonds (φ,ψ,ω) for each amino acid in the polymer. Fig 1.2.1 the polypeptide chain The specificity of protein structure and function arises from its amino acid sequence. Although all backbone atoms for all residues are identical, each residue in the polypeptide chain is one of 20 amino acids, shown as R-groups in the Fig 1.2.1. Amino acids differ from one another through their side-chains, defined as those non-backbone atoms bonded to the Cα atom. The twenty amino acids and their side-chains are shown in Fig 1.2.2. The structure of the residue side-chains can be described either in the Cartesian coordinates (x,y,z) of each side-chain atom, or as the torsion angle values of each rotatable bond in the side-chain (χ1, χ2, χ3…). Note that because different amino acids have different side- chains, they have a varying number of torsion angles, from no torsion angles in glycine, to four torsion angles in lysine. 7 Fig 1.2.2 Amino Acid types Finally, many protein systems in biology include more than a single protein chain. A protein complex, for example, by definition, involves the interaction of two proteins, or two protein chains. The relative spatial position and orientation of one polypeptide chain relative to another can be described exactly using three dimensions of translation (T) and three dimensions of rotation (R), based on a given fixed spatial reference frame. 1.3 Introduction to Molecular Energies The energy of a given conformation is a result of molecular forces that act on and between all non-bonded atoms in that conformation. In protein biophysics, the most significant energy contributions that favor one conformation over another are Van der Waals energies, steric repulsions, electrostatic energies, solvation energies, and hydrogen bonding energies. These component energies themselves functions of the atomic positions and atom types of the atoms in the protein structure. 10 | or stretches of molecules from one Pose object into another. | thats way more comments than I have ever written in a single | stretch before... | | Methods defined here: | __init__(...) | __init__( (object)arg1) -> None : | default constructor |... The second mode of PyRosetta is script-mode. This is the mode where most writing and development of PyRosetta protocols is done. Simply add the lines import the Rosetta library and initializing PyRosetta to the beginning of the Python script to access the Rosetta functions and objects. PyRosetta scripts can be easily viewed and edited through any major Text Editor, and shared, through the website at http://www.pyrosetta.org. Sample scripts can be downloaded from the ‘PyRosetta Scripts’ section of the website. 11 Unit 2: Protein structure in PyRosetta The pose object contains all the necessary information to completely define a protein system, including the amino acid sequence, the Cartesian coordinates, and internal coordinates of all proteins in the system. The Protein Databank File (PDB) is the standardized file-format for storing the Cartesian-coordinates for each atom in a protein, and is used in PyRosetta to input or output a protein structure. 2.1 Exploring the Pose object To load a PDB file you have to first create an empty pose object, and then assign a PDB structure to that pose: >>my_pose = Pose() >>pose_from_pdb( my_pose, “test.pdb” ) You can access a summary of the pose through: >>print my_pose PDB file name: test.pdb Total residues:116 Sequence: DAITHSILDWIDNLESPLSEKVSERSGYSKWHLQRMFKKETGHSLGQYRSRK… Fold tree: FOLD_TREE EDGE 1 116 -1 This summary contains the pdb file name that the pose originated from, the total number of residues in the pose, the amino acid sequence, and the fold tree. The fold tree (described in Section 7.2) describes the path of refolding of the pose. This information can also be accessed individually: >>print my_pose.total_residue() 116 >>print my_pose.sequence() DAITHSILDWIDNLESPLSEKVSERSGYSKWHLQRMFKKETGHSLGQYRSRK… >>print my_pose.fold_tree() FOLD_TREE EDGE 1 116 -1 Each residue in a pose is represented as a residue object within the pose. It can be accessed as (residue #2 as an example) below. Printing the residue displays the residue type, residue number (in pose numbering), all the atoms that make up the residue and their respective Cartesian coordinates. Again, this information can be accessed individually as well. >>print p.residue(2) ALA 2: N : 0.764 37.858 76.239 12 CA : 0.778 39.078 77.028 C : -0.624 39.484 77.368 O : -0.925 39.776 78.527 CB : 1.474 40.216 76.301 H : 1.417 37.762 75.474 HA : 1.307 38.885 77.961 1HB : 1.461 41.108 76.927 2HB : 2.506 39.936 76.09 3HB : 0.955 40.422 75.367 >>print p.residue(2).name() ALA Residue chemical properties that have been defined in the parameter files (see Section 2.4) can also be accessed: >>p.residue(2).is_polymer() >>p.residue(2).is_protein() >>p.residue(2).is_DNA() >>p.residue(2).is_RNA() >>p.residue(2).is_NA() >>p.residue(2).is_ligand() >>p.residue(2).is_polar() >>p.residue(2).is_aromatic() The residue numbering in a pose object is always consecutive, beginning at residue 1. This is in contrast to PDB files, in which each residue has both a residue number and chain letter. PyRosetta will read a PDB file and sequentially number the residues in the file as they are listed. The conversion between pose residue numbering and PDB residue number and chain letter can be access as: >>print my_pose.pdb_info().pdb2pose(‘I’,8) 239 >>print my_pose.pdb_info().pose2pdb(239) 8 I 2.2 Accessing and manipulation protein geometry through the pose The pose object stores the protein structure as both Cartesian coordinates and internal coordinates. We can access the internal coordinates for each residue as follows: >>res_num = 5 >>print my_pose.phi(res_num) >>print my_pose.psi(res_num) >>print my_pose.chi(chi_num, res_num) In the same way, new internal coordinates can be assigned at each residue: >>my_pose.set_phi(res_num, new_angle) Figure 5. Section 5 be querie >>pose.i <<True 2.4 Para Paramete miniros directory The full- A 6. Full-atom .6 describes d as to whet s_fullato meter files r files descr etta_data . atom residue s an examp (left) and c how to con her it is in f m() for residues ibing the ch base/che parameters le, the param entroid (rig vert a protei ull-atom mo in PyRoset emical and s mical/res are stored i eter file fo ht) represen n between f de using: ta tructural pr idue_type n the /fa_s r Threonine tations for T ull-atom and operties of e _sets dire tandard/ is shown be ryptophan centroid m ach residue ctory in the residue_t low: ode. A Pos is found in t PyRosetta ypes direct 15 e can he ory. 16 NAME THR IO_STRING THR T TYPE POLYMER #residue type AA THR ATOM N Nbb NH1 -0.47 ATOM CA CAbb CT1 0.07 ATOM C CObb C 0.51 ATOM O OCbb O -0.51 ATOM CB CH1 CT1 0.14 ATOM OG1 OH OH1 -0.66 ATOM CG2 CH3 CT3 -0.27 ATOM H HNbb H 0.31 ATOM HG1 Hpol H 0.43 ATOM HA Hapo HB 0.09 ATOM HB Hapo HA 0.09 ATOM 1HG2 Hapo HA 0.09 ATOM 2HG2 Hapo HA 0.09 ATOM 3HG2 Hapo HA 0.09 LOWER_CONNECT N UPPER_CONNECT C BOND N CA BOND N H BOND CA C BOND CA CB BOND CA HA BOND C O BOND CB OG1 BOND CB CG2 BOND CB HB BOND OG1 HG1 BOND CG2 1HG2 BOND CG2 2HG2 BOND CG2 3HG2 CHI 1 N CA CB OG1 CHI 2 CA CB OG1 HG1 PROTON_CHI 2 SAMPLES 3 60 -60 180 EXTRA 1 20 PROPERTIES PROTEIN POLAR NBR_ATOM CB NBR_RADIUS 3.4473 FIRST_SIDECHAIN_ATOM CB ACT_COORD_ATOMS OG1 END ICOOR_INTERNAL N 0.000000 0.000000 0.000000 N CA C ICOOR_INTERNAL CA 0.000000 180.000000 1.458001 N CA C ICOOR_INTERNAL C 0.000000 68.800049 1.523257 CA N C ICOOR_INTERNAL UPPER 149.999954 63.800026 1.328685 C CA N ICOOR_INTERNAL O 180.000000 59.199905 1.231016 C CA UPPER ICOOR_INTERNAL CB -121.983574 68.467087 1.539922 CA N C ICOOR_INTERNAL OG1 -0.000077 70.419235 1.433545 CB CA N ICOOR_INTERNAL HG1 0.000034 70.573135 0.960297 OG1 CB CA ICOOR_INTERNAL CG2 -120.544136 69.469185 1.520992 CB CA OG1 ICOOR_INTERNAL 1HG2 -179.978256 70.557961 1.089826 CG2 CB CA ICOOR_INTERNAL 2HG2 120.032188 70.525108 1.089862 CG2 CB 1HG2 ICOOR_INTERNAL 3HG2 119.987984 70.541740 1.089241 CG2 CB 2HG2 ICOOR_INTERNAL HB -120.292923 71.020676 1.089822 CB CA CG2 Residue identification information PDB atom names, Rosetta atom types, and partial charges Polymer connectivity information Bond connectivity information Defining side-chain torsion angles Defining proton side-chain torsion angle sampling Residue properties Defining parameters for neighbor calculations 17 ICOOR_INTERNAL HA -120.513664 70.221680 1.090258 CA N CB ICOOR_INTERNAL LOWER -149.999969 58.300030 1.328684 N CA C ICOOR_INTERNAL H 180.000000 60.849979 1.010000 N CA LOWER The centroid residue parameters can be found in /centroid/residue_types directory. As an example, the centroid parameter file for Threonine is shown below: NAME THR IO_STRING THR T TYPE POLYMER #residue type AA THR ATOM N Nbb NH1 -0.47 ATOM CA CAbb CT1 0.07 ATOM C CObb C 0.51 ATOM O OCbb O -0.51 ATOM CB CB CT1 0.14 ATOM H HNbb H 0.31 LOWER_CONNECT N UPPER_CONNECT C BOND N CA BOND N H BOND CA C BOND CA CB BOND C O PROPERTIES PROTEIN POLAR NBR_ATOM CEN NBR_RADIUS 3.025 FIRST_SIDECHAIN_ATOM CB ICOOR_INTERNAL N 0.000000 0.000000 0.000000 N CA C ICOOR_INTERNAL CA 0.000000 180.000000 1.458001 N CA C ICOOR_INTERNAL C 0.000000 68.800049 1.523257 CA N C ICOOR_INTERNAL UPPER 149.999954 63.800026 1.328685 C CA N ICOOR_INTERNAL O 180.000000 59.199905 1.231016 C CA UPPER ICOOR_INTERNAL CB -121.983574 68.467087 1.539922 CA N C ICOOR_INTERNAL LOWER -149.999969 58.300030 1.328684 N CA C ICOOR_INTERNAL H 180.000000 60.849979 1.010000 N CA LOWER ##centroid-specific ATOM CEN CEN_THR H 0.0 BOND CA CEN ICOOR_INTERNAL CEN -128.951279 72.516479 2.072556 CA N C Residue structure defined in internal coordinates Residue identification information PDB atom names, Rosetta atom types, and partial charges Polymer connectivity information Bond connectivity information Residue structure defined in internal coordinates Residue properties Defining parameters for neighbor calculations Centroid-specific information 20 3.3 Creating or editing a score function Instead of using a pre-made score function, such as ‘standard’ or ‘docking’, you can also create your own score function from scratch using the various scoring components available in Rosetta. By definition, an empty score function is a score function in which all component weights are set to zero. To add a component, simply set the weight of the desired component to a non-zero number. In the following example, we build a simple score function that includes only the repulsive LJ-potential and hydrogen bonding. >>my_scorefxn = ScoreFunction() >>my_scorefxn.set_weight(fa_rep, 1.0) >>my_scorefxn.set_weight(hbond_lr_bb, 1.0) >>my_scorefxn.set_weight(hbond_sr_bb, 1.0) 3.4 Accessing more detailed scoring information Beyond simply returning the total energy of a given pose, you can access a comprehensive breakdown of the contribution of each scoring component to the total score. >>my_scorefxn.show(my_pose) ------------------------------------------------------------ Scores Weight Raw Score Wghtd.Score ------------------------------------------------------------ fa_rep 1.000 981.311 981.311 hbond_sr_bb 1.000 -56.655 -56.655 hbond_lr_bb 1.000 -103.050 -103.050 --------------------------------------------------- Total weighted score: 821.606 The pose object also stores the latest energy calculations, and you can access this information through the energies() object. Through energies(), you can access a further breakdown of the scoring information on a residue by residue basis. >>res_num = 5 >>my_pose.energies().show(res_num) E fa_rep hbond_sr_bb hbond_lr_bb E(i) 5 0.53 0.00 0.00 You can also access individual scoring components from individual residues directly: >>my_pose.energies().residue_total_energies(res_num)[fa_rep] 0.532552907292 21 Unit 4: Simple simulations in PyRosetta In molecular modeling using PyRosetta we are generally searching a conformational space under a given energy function for the global minimum. The predominant sampling strategy used to search this conformational space is Monte Carlo-based sampling using a large number of short trajectories or paths. The lowest energy structure accessed in each trajectory is stored as a ‘decoy’. Theoretically, assuming adequate sampling and discrimination, the lowest energy decoy corresponds to the global minimum. 4.1 Introduction to the Monte Carlo sampling algorithm In Monte Carlo-base sampling, random perturbations, or ‘moves’ are made to a starting structure and those moves are either accepted or rejected based on the resulting change of energy due to that move. The most common basis for accepting or rejecting a move is through the Metropolis Criterion. The Metropolis criterion states that if the change in energy (ΔE) is less than zero, that is the move decreased the energy, then always accept the move. If the change in energy is greater than zero, then accept that move only some of the time. The probability of accepting that move is a function of how much it increased the energy by: P(ΔE) = e-ΔE/kT In practical terms this means: 1. A move is made to the structure 2. The energy is calculated and compared with the previous energy -> ΔE 3. If ΔE < 0 the move is accepted 4. If ΔE > 0, then: • The probability (P) of the move is calculated based on ΔE • A random number, i, is generated from 0 to 1 • If i < P, then the move is accepted, otherwise the move is rejected Note the role of kT in the calculation of the probability of acceptance. For a given ΔE, as kT increases, the probability of acceptance increases. At higher values of kT, the structure can more easily escape local minima, but the average energy of the structure is higher; at lower values of kT, the structure is more likely to get ‘stuck’ in a local minima, but will generally have a lower energy. In simulated annealing, kT starts at a high value and either linearly or geometrically decreases to a final value through the course of the simulation. This allows the structure to both escape local minima and also settle into a global minimum. 22 4.2 Monte Carlo Object The MonteCarlo object keeps track of all variables necessary to run Monte Carlo simulations in Rosetta and also applies the Metropolis criterion. The MonteCarlo object is initialized with a score function to calculate the energy, a pose object to serve as a reference structure, and the temperature, which is used in the Metropolis Criterion: >>kT = 1.0 >>mc = MonteCarlo(scorefxn, pose, kT) Following a move to the pose object, the Metropolis Criterion is applied using: >>mc.boltzmann(pose) Within this function, the energy of inputted pose is calculated using the score function and compared to the energy of the last accepted pose object. The Metropolis criterion is applied to the pose; if the move was accepted than the inputted pose remains unchanged, and the last accepted pose, within the MonteCarlo object, is updated. If the move was rejected, the inputted pose is switched to the last accepted pose, and the last accepted pose is unchanged. The lowest energy structure assessed by the MonteCarlo object can be accessed as well. The lowest energy structure is not only recovered at the end of the simulation, but often intermittently throughout the simulation as well. >>mc.recover_low(pose) For simulated annealing, the temperature (or kT) is decremented throughout the simulation. This can be done by changing the temperature that the MonteCarlo object uses for evaluating the Metropolis Criterion: >>kT = 2.0 >>mc.set_temperature(kT) In addition to applying the Metropolis Criterion, the Monte Carlo object stores a variety of information on acceptance and rejection: >>mc.show_scores() <<protocols.moves.MonteCarlo: MonteCarlo:: last_accepted_score,lowest_score: -8.02917 -8.02917 >>mc.show_counters() <<protocols.moves.MonteCarlo: unk trials= 60000; accepts= 0.6766; energy_drop/trial= 0.00465 >>mc.show_state() <<protocols.moves.MonteCarlo: MC: 1 -8.02917 -8.02917 -8.02917 -8.02917 0 0 0 2 25 While a decoy is being generated, the Job Distributor will create a temporary file called test_output_0001.pdb.in_progress. Once that decoy is complete, it will be renamed test_output_0001.pdb, for decoy #1. Through the use of these temporary files, the script can be run multiple times for multiple processors all working on the same pool of decoys. In addition to the decoy structures, a score file is generated that lists each decoy, its RMSD to a reference structure and a break-down of its score. This scorefile is stored as test_output.fasc for all-atom structures and test_output.sc for centroid structures. Additional information, such as particular measurements like a loop RMSD, or a specific residue-residue distance can be stored as an additional line in the scorefile can be added with the following line, just prior to outputting the decoy. >>jd.additional_decoy_info(“loop_RMSD” + str(loop_rmsd)” + “res49A_res20B” + str(res_dist)”) Note that if a MonteCarlo object is used in my_protocol, in the above example, it must be reset each time the function is called. Otherwise information from the previous decoy will be retained and recovered in the Monte Carlo object. This will lead to trajectories that are not independent. 26 Unit 5: Conformational sampling in PyRosetta – Movers Rosetta uses a large variety of structural perturbations, or ‘moves’ that are specifically designed for efficient conformational sampling for proteins. These include moves that alter backbone torsion angles, optimize side chain conformations, manipulate rigid-body positions of multiple protein chains. While in theory all these moves can be enacted using the functions that manipulate protein geometry that we’ve already learned (i.e. pose.residue().set_phi()), in practice, these moves are often very specific, complex, combinations of smaller perturbations, that have been designed to search the conformation space in a computationally efficient manner. These moves, called Movers, are among the most powerful features of Rosetta and have been rigorously benchmarked and tested on protocols published in scientific literature. 5.1 Introduction to the Mover base class Movers are one of the main archetypal classes in PyRosetta. After construction, their basic function is to be ‘applied’ to a pose, which, for most traditional movers, means perturbing the structure in some way. There are a large variety of Movers in PyRosetta. Learning to use them requires understanding three things: 1. Mover construction – what is needed to construct the mover? Some Movers are very simple and require almost nothing for construction. Others require many other objects that define how the Mover is implemented. 2. Mover options – what options exist to manipulate the Mover? In additional to options in during constructions, Movers often have a large number of variables that can be altered from the default settings. 3. Mover.apply(pose) – What does the mover do when you apply it to a Pose? The Mover will use all the instructions given to it during construction and option-setting to modify how the mover is implemented on a Pose object. Just as the ScoreFunction object is principally passed a Pose object and returns its energy with ScoreFunction(pose), a Mover is implemented on a Pose object with Mover.apply(pose). 5.2 Limiting the search space with MoveMap In Rosetta, we typically try to define the limits of the conformation space for a particular molecular modeling problem in terms of which degrees of freedom (in internal coordinates) we allow to be flexible, and which degrees of freedom we want to remain fixed. The MoveMap gives us a way to implement that with the Movers. 27 Movers typically apply changes on a protein structure by perturbing its internal coordinates, backbone torsion angles, side-chain torsion angles, and rigid-body jumps (see Unit 1). The MoveMap contains instructions about internal coordinates are allowed to move (be flexible) and which ones are to remain fixed. The MoveMap is a generic object that can be applied used with any Pose, but it is typically used for the specific application of a Mover or set of Movers. Construct a MoveMap as follows: >>movemap = MoveMap() On construction, all degrees of freedom are set to False, indicating that neither backbone torsion angles, nor side-chain torsion angles are allowed to move. We can set specific degrees of freedom to True, for example for residue 5. We can also set a range of residues to be true for backbone torsion angles: >>movemap.set_bb(2, True) >>movemap.set_chi(2,True) >>movemap.set_bb_true_range(5,10) We can allow the rigid-body orientation of one protein chain relative to another to be altered by allowing the jump (jump #1 in the example) that defines the orientation to be flexible: >>movemap.set_jump(1, True) Finally, you can view the instructions for the MoveMap: >>movemap.show(10) << resnum BB CHI 001 FALSE FALSE 002 TRUE TRUE 003 FALSE FALSE 004 FALSE FALSE 005 TRUE FALSE 006 TRUE FALSE 007 TRUE FALSE 008 TRUE FALSE 009 TRUE FALSE 010 TRUE FALSE 5.3 Backbone Movers: SmallMover and ShearMover The simplest movers that exist in Rosetta are the backbone movers SmallMover and ShearMover. They are used frequently to make small perturbations to the backbone structure for structural refinement and relaxation simulations. 30 A fragment library can be generated for a given protein sequence by going to Robetta website (http://robetta.bakerlab.org/fragmentsubmit.jsp) and submitting the desired sequence in FASTA format. The method for generating a fragment library involves searching a non-redundant subset of the Protein Data Bank for the 100 highest-frequency fragments that contain a similar sequence profile to the input sequence, in window-lengths of the same size as the fragment size. Theoretically, this method relies on the observation that local low-energy backbone conformations are partially a result of the local sequence. The fragment insertion method biases backbone sampling towards known low-energy conformations for a given local sequence. A FragmentMover requires a fragment library (a FragSet) from which to select fragments, and a MoveMap, which specifies which residues are allowed to be altered. A fragment library can be read in as follows, in this example, for a 3mer fragment library file named test_in_3mer.frag: >>fragset = ConstantLengthFragSet(3) >>fragset.read_fragment_file(“test_in_3mer.frag”) >>movemap = MoveMap() >>movemap.set_bb(True) >>frag_mover = ClassicFragmentMover(fragset, movemap) >>frag_mover.apply(pose) Like the SmallMover, standard fragment insertion that replace a 3-residue or 9-residue window of backbone torsion angles can have large downstream effects on the protein structure during the re-folding step, leading to drastic changes in the global structure. To address this, there is a fragment mover called SmoothFragmentMover, which selects fragments that minimize downstream effects. This leads to sampling of diverse local conformations without massively altering the global structure, making it ideal for structural refinement or relaxation. >>frag_mover = SmoothFragmentMover(fragset, movemap) >>frag_mover.apply(pose) 5.5 Energy Minimization Minimizing the energy with respect to certain flexible degrees of freedom is a quick and easy way to lower the energy of a given structure without altering its structure substantially. Often significant decreases in energy can be achieved with minute changes in backbone or side-chain torsions. Energy minimization is common in Rosetta and often follows explicit perturbations and precedes a Monte Carlo Metropolis Criterion step. Essentially, it provides the lowest energy structure in the immediate local vicinity of a conformation just after an explicit perturbation step. 31 Energy Minimization in Rosetta is carried out primarily through the MinMover. In the construction of the MinMover, one mainly needs to supply the energy function that is to be minimized and a MoveMap which defines which conformational degrees of freedom to minimize over. Additionally, one can provide the minimization type used and the threshold for minimization (i.e. when minimization is considered complete), also known as the ‘tolerance’. >>movemap = MoveMap() >>movemap.set_bb(True) >>scorefxn = create_score_function(‘standard’) >>tolerance = 0.01 >>min_type = “dfp_min” >>minmover = MinMover(movemap, scorefxn, min_type, tolerance, True) >>minmover.apply(pose) The MinMover can also be constructed with default settings where only specific options are changed later: >>minmover = MinMover() >>minmover.score_function(scorefxn) >>minmover.movemap(movemap) >>minmover.tolerance(tolerance) There are primarily two minimization methods used in Rosetta: Linear minimization, or steepest- descent minimization (linmin), and Davidson-Fletcher-Powell minimization (dfp_min). Linmin is computational cheaper than dfp_min, but generally minimizes less well. 5.6 Other types of movers In addition to the traditional movers that directly perturb the protein structure by altering their internal coordinates, there are other types of movers as well. Combination movers, such as the SequenceMover and RepeatMover, are essentially ‘mover containers’ that execute the mover(s) within them with some instruction. SequenceMover is a mover with a list of movers within it; when applied, it applies all the movers within it consecutively. >>sequence_mover = SequenceMover() >>sequence_mover.add_mover(small_mover) >>sequence_mover.add_mover(minmover) >>sequence_mover.apply(pose) In the above example, when sequence_mover.apply(pose) is called, the SequenceMover will apply small_mover and then minmover. RepeatMover is a mover that repeats the mover within it a user-specified number of times, in this case it will repeat the small_mover 5 times when it is applied to the pose. 32 >>repeats = 5 >>repeat_mover = RepeatMover(small_mover, repeats) >>repeat_mover.apply(pose) The most important of these ‘container’ movers is the TrialMover. The TrialMover contains a user-inputted Mover and MonteCarlo object. On TrialMover.apply(pose), it executes the mover contained within it, and then applies MonteCarlo.boltzmann() on the resulting pose, accepting or rejecting that move based on the Metropolis Criterion. >>mc = MonteCarlo(scorefxn, pose, kT) >>trial_mover = TrialMover(small_mover, mc) >>trial_mover.apply(pose) The most common trial movers used in Rosetta involve using a sequence mover that makes one or more explicit perturbations, followed by energy minimization, followed by the TrialMover’s Metropolis Criterion: >>smallmin = SequenceMover() >>smallmin.add_mover(small_mover) >>smallmin.add_mover(minmover) >>smallmintrial = TrialMover(smallmin, mc) >>smallmintrial.apply(pose) Finally, we have a mover that switches a Pose between full-atom and centroid-mode representation. This is useful during multi-scale protocols in which a protein starts out in centroid-mode for most of the conformational search and then is converted into full-atom mode for refinement. >>to_centroid = SwitchResidueTypeSetMover(‘centroid’) >>to_centroid.apply(pose) >>to_fullatom = SwitchResidueTypeSetMover(‘fa_standard’) >>to_fullatom.apply(pose) A pose that has just been converted into a full-atom pose has coordinates for all atoms at all side- chains but the torsion values for the side-chains is initialized to ‘0’. From here either the side- chain torsion angle conformations can be optimized to sensible values using side-chain packing (Unit 6), or they can be recovered from a reference full-atom structure, such as a starting structure (in the example below, starting_pose) using the ReturnSidechainMover: >>recover_sidechains = ReturnSidechainMover(starting_pose) >>recover_sidechains.apply( pose ) 35 ... 5 0 0 SER 6 0 0 GLY 7 1 0 CYS 8 1 0 THR 9 0 0 ASP ... The above functions fix all residues to their original rotamer, and allow repacking of only specific residues. 6.3 The Resfile One way to input specific, custom, instructions for side-chain packing is by using an input file called the resfile. The resfile contains the same information as the pack_task and allows the user to directly manipulate each residue. A resfile can be generated for a given pose or PDB file as follows: >>generate_resfile_from_pose( pose, “test.resfile”) >>generate_resfile_from_pdb( pdb_file, “test.resfile”) In order to edit the resfile you will have to open it with a Text Editor of your choice: start 1 A NATRO 2 A NATRO 3 A NATRO 4 A NATRO 5 A NATRO 6 A NATRO 7 A NATRO 8 A NATRO The first column is the residue number, the second column is the chain letter, and the third column contains instructions for that residue: NATRO – instructs the packer to keep the sidechain conformation fixed. The rotamer set for that residue contains only the native rotamer. NATAA – instructs the packer to maintain the original residue type, but allow the sidechain conformation of that residue to be optimized by the packer. The rotamer set for that residue contains all the rotamers from the rotamer library for that residue-type only. PIKAA XXXX – instructs the packer to allow the side-chain to be mutated into any one of the proceeding amino acids (one-letter codes). The rotamer set for this residue contains all the rotamers from the rotamer library for all of the allowable residue-types. ALLAA – instructs the packer to allow the side-chain to be mutated into any of the 20 amino acids. The rotamer set contains all of the rotamers in the rotamer library for all 20 amino acids. 36 A task can be created from the resfile using: >>packer_task = standard_packer_task( pose ) >>packer_task.read_resfile(“test.resfile”) >>packer = PackRotamersMover(scorefxn, packer_task) >>packer.apply(pose) Now, when you print the packer_task, it will reflect all of the instructions from the resfile. 6.4 TaskFactory A PackerTask is fairly limited. It applies only to the Pose it was constructed for and is difficult to modify on the fly. If the Pose amino-acid sequence is changed, the original PackerTask that was constructed with it becomes obsolete. In some cases the exact residues that are to be packed varies as the structure changes. For example, in protein-protein docking, only the interface residues are supposed to be packed, but the residues that make up the interface change throughout the docking process. In these cases, a TaskFactory can be used, to create a task on the fly, each time PackRotamers.apply(pose) is called. A TaskFactory’s primary job is to create a PackerTask based on a list of instructions, called TaskOperations, that given to it. Most instructions that can be given to a PackerTask, have analogous instructions for the TaskFactory. >>tf = standard_task_factory() >>tf.push_back(RestrictToRepacking()) >>tf.push_back(ReadResfile(“test.resfile”) >>tf.push_back(RestrictToInterface( jump_num ) >>packer_task = tf.create_task_and_apply_taskoperations( pose ) In the above example, the TaskFactory will generate a task that restricts all residues to repacking, puts in additional restrictions based on a user-inputted resfile, and finally restricts packing to residues at the interface defined by the Jump ‘jump_num’. Additionally, the TaskFactory can be sent to a PackRotamersMover. The Mover will then create a new PackerTask each time PackRotamers is applied: >>packer_mover = PackRotamersMover(scorefxn) >>packer_mover.task_factory(tf) >>packer_mover.apply(pose) A list of some of the TaskOperations available include: 37 TaskFactory TaskOperations tf = standard_task_factory() Creates a default TaskFactory tf.push_back(IncludeCurrent()) includes the current rotamers in the Pose to the rotamer sets used for packing. Defaulted on tf.push_back(ReadResFile(“test.resfile”)) applies instructions from the resfile when creating a PackerTask tf.push_back(NoRepackDisulfides()) holds disulfide bond cysteine side-chains fixed. Defaulted on. tf.push_back(RestrictToInterface(1)) allows repacking only at the interface defined by the jump number (in example jump# 1) pr = PreventRepacking() pr.include_residue( 5 ) tf.push_back(pr) turns off repacking for specified residues (residue #5 in example) rr = RestrictResidueToRepacking() rr.include_residue(5) tf.push_back(rr) turns on repacking for specified residues (residue #5 in example) 6.5 Other side-chain movers Besides side-chain packing there are two other side-chain movers. The first is the RotamerTrials mover. This mover acts as a ‘cheap’ version of the standard packer. It’s much faster than standard packing and quickly finds local minima in side-chain conformation space. The standard PackRotamersMover is much better at finding the global minimum in side-chain conformation space. RotamerTrialsMinMover is a variation of the RotamerTrialsMover that is uses energy minimization to minimize the torsion angles while selecting rotamers. It is the only mover capable of going off-rotamer in search of low-energy side-chain conformations. >>rot_trials = RotamerTrialsMover(scorefxn, tf) >>rot_trials.apply( pose ) >>rt_min = RotamerTrialsMinMover(scorefxn, tf) >>rt_min.apply( pose ) Finally, there are a number of wrapper functions that use the previous movers in highly specific, but commonly used ways. For example, the mutate_residue function creates a point mutation at a user-specified position on the pose. Below, residue #5 in the Pose is mutated to a Serine. >>mutate_residue( pose, 5, ‘S’) 40 FoldTree().add_edge(node1, node2, edge_type). Note that all ‘peptide’ edges have an type of -1, while the jump has an edge of type that is a positive integer, corresponding to the jump number, in this case, 1. Also note that the direction of folding (N→C or C→N) is described by the order of nodes used to define the edge. For N→C folding, node1 < node2, for C→N folding, node1 > node2. >>ft = FoldTree() >>ft.add_edge(1,i,-1) >>ft.add_edge(i,cutpoint,-1) >>ft.add_edge(i,j,1) >>ft.add_edge(cutpoint, j, -1) >>ft.add_edge(j,n) >>pose.foldtree( ft ) Finally you can check if a FoldTree is valid: >>ft.check_fold_tree() <<True In the above example, we have defined a FoldTree manually and entered it into the Pose. For most PyRosetta protocols, functions exist to define the fold tree for that particular protocol automatically. A user would only manually define a fold tree for highly customized algorithms where automatically generated, protocol-specific fold trees cannot be used. A single loop fold tree can be defined using the Loop object for a Pose: >>set_single_loop_fold_tree(pose, loop1) >>print p.foldtree() << 7.3 Loop Modeling protocol movers Cyclic-Coordinate Descent (CCD) Loop Closure During sampling, using SmallMovers, ShearMovers, and/or FragmentMovers on loop residues on a Pose with a loop-modeling FoldTree, a chain break will form at the cutpoint. Rosetta primarily uses CCD loop closure to close that chain-break to recover a continuous chain along the polypeptide, from the N to the C terminus. CCD loop closure is carried out using the CcdLoopClosureMover. It uses Cyclic-coordinate descent to sample the torsion angles of the loop residues in a way that minimizes the chain-break. On construction it requires a Loop object, and a MoveMap that allows the loop residues to move: >>movemap = MoveMap() >>movemap.set_bb_true_range(i,j) >>ccd = CcdLoopClosureMover(loop1, movemap) >>ccd.ap Low-reso The low- LoopMov and the L that is us sampling >> score >>fragse >>loop_p >>loop_p Addition >>loop_p Note: Th loop mod loops wit High-res The high mover. O under wh >>scoref >>loop_r >>loop_r ply(pose) lution Loop resolution p er_Perturb_ oops object ed by Fragm is carried o fxn = cre t = Const erturb = erturb.ap ally a numb erturb.ra e LoopMove eling examp hout this mo olution Loop -resolution p n construct ich samplin xn = crea efine = L efine.app Modeling hase of loop CCD. This . On constru ent movers ut. ate_score_ antLengthF LoopMover_ ply( pose er of options ndomize_lo r_Pertur le on the w ver. Refinemen hase of loop ion it requir g is carried te_score_f oopMover_R ly( pose ) -modeling, mover auto ction it req during samp function_ ragSet(3, Perturb_C ) can be set, op(True) b_CCD mov ebsite (http: t modeling es a Loops out. unction(‘ efine_CCD described in matically se uires a Loop ling, and th ws_patch(‘ “test_in_3 CD(loop_li including ra er does not w //www.pyro is carried ou object to de standard’, (loop_list (Rohl et al. ts up a fold s object to e ScoreFun cen_std’, mer.frag” st, score ndomizing ork in PyR setta.org/scr t using the L fine the loop ‘score12 , scorefx ) can be acc tree based o define the lo ction und ‘score4L’ ) fxn, frag_ the input loo osetta v1.0. ipts.html) fo oopMover s, and a Sco ’) n) essed using n an input p ops, a Frag er which ) set) p conforma Please see r how to mo _Refine_C reFuncti 41 the ose Set tion: the del CD on Figure 7. LoopMov the Loop 7.4 Prote Protein-p unbound the two p correspon resolution The Fold points are the jump i and j, re modeling Figure 7. The dock >>Dockin >>print <<FOLD_T EDGE By defau partners, below, pa >>Dockin 3 Five struc er_Perturb_ Mover_Refi in-protein rotein docki components artners are s ds to a near perturbatio Tree used in defined dif points corre spectively). : 4 Docking f ing Foldtr gProtocol pose.fold REE EDGE 285 301 - lt it will cre such as anti rtner 1 is co gProtocol tures genera CCD move ne_CCD mo docking ng is used t . In dockin ampled, and -native prot n/search ph docking is ferently. A spond to res The fold tr old tree ee can be s (jump_num) tree() 1 190 -1 1 ate a jump b -bodies, the nsists of ch (jump_num) ted from the r (left) after ver (right). o predict the g, different , theoretica ein complex ase and a hi identical to chain-break idues closes ee is as follo et up autom .setup_fo EDGE 190 etween the f chains for e ains ‘H’ and .setup_fo same starti randomizin structure o docked conf lly, the lowe structure. gh-resolutio the one used is set in bet t to the cent ws, and the atically usin ldtree(pos 238 -1 E irst two cha ach partner ‘L’, and pa ldtree(pos ng structure g the startin f protein com igurations, o st-energy do Like in Loop n refinemen in loop mo ween the tw er of masse refolding p g the Docki e) DGE 190 2 ins in the Po must be spe rtner 2 cons e, ‘HL_A’ using the g loop confo plexes star r rigid-body cked config Modeling, t phase. deling excep o partners ( s for each pa rocess is the ngProtoco 85 1 EDGE se. For mu cified. In th ists of chain ) rmation, an ting from th orientation uration there is a lo t that the ju A and B), an rtner, (resid same as in l(): 285 239 lti-chain e example ‘A’: 42 d eir s of w- mp d ues loop -1 45 >>scorefxn_low = create_score_function(‘interchain_cen’) >>docking_low = DockingLowRes(scorefxn_low, jump_num) >>docking_low.apply( pose ) Following the low-resolution phase of docking, the structure must be converted into a full-atom structure for the high-resolution refinement step. This can be done as described in Section 5.5, or with a DockingProtocol() function written specifically for this purpose (in the example below recovering the side-chains from a full-atom starting structure called starting_pose): >>DockingProtocol().recover_sidechains(pose, starting_pose) Finally, high-resolution refinement is carried out using the DockingHighRes mover: >>scorefxn_high = create_score_function(‘docking’) >>docking_high = DockingHighRes(scorefxn_high, jump_num) >>docking_high.apply( pose ) Additionally, there are a number of options that can be set that modify the DockingHighResMover. The MoveMap in this mover is primarily used during minimization – inputting a custom MoveMap can allow for energy minimization along additional degrees of freedom beyond rigid-body minimization, for example, backbone minimization. >>docking_high.set_move_map( movemap ) >>docking_high.set_min_type( ‘dfpmin’ ) The DockingHighRes mover uses a variation of the RigidBodyPerturbMover that uses the center of masses of the interface residues to define the reference frame for rigid body perturbations instead of the center of masses of the entire partners which is the default behavior. It also uses translation and rotation magnitudes of 0.1Ǻ and 5.0˚ respectively. >>use_interface = True >>rbmover = RigidBodyPerturbMover( jump_num, 0.1, 5.0, partner_downstream, use_interface) 7.7 Modeling small molecules in PyRosetta PyRosetta is generally set up to model proteins using the 20 standard amino acids. Small- molecules and other non-amino acids moieties, such as post-translational modifications and cofactors, are often critical to accurate modeling certain systems. In PyRosetta, these non- standard molecules are treated as additional ‘residues’ to the standard residue set. Parameters describing the chemical and atomic properties of the standard residue set are stored in the minirosetta_database directory within PyRosetta. In order to use non-standard 46 molecules, a parameter file must first be created so that PyRosetta can properly model the structure and energies of the non-standard molecule. A parameter file (called a .params file) must be generated from an MDL Molfile format (.mol file) which contains the necessary structural and connectivity information and can be created from a PDB file containing the atomic coordinates for the molecule (.pdb): 1. Isolate the atomic coordinates of the non-standard molecule into a PDB file 2. Generate a .mol file from the .pdb file. This can be done with the free web tool MN.CONVERT at http://www.molecular-networks.com/demos 3. Use the molfile_to_params.py (found in the ligand docking download at http://www.pyrosetta.org/scripts.html) on the .mol file to generate 1) a .params file and 2) a PDB file containing the atomic coordinates of the non-standard molecule using Rosetta- standardized atom-types. >>molfile_to_params.py ATP.mdl –n ATP outputs: ATP_0001.pdb ATP.params 4. Replace the coordinates of the non-standard molecule in the PDB file of the starting structure with the coordinates output from the script in the previous step. To load the nonstandard-molecule, first create a list of all nonstandard-molecule parameter files, then create a residue set that includes them, and finally, use that expanded residue set to read in the PDB file: >>params_list = Vector1[‘ATP.params’] >>res_set = generate_nonstandard_residue_set(params_list) >>pose_from_pdb(pose, res_set, “PKA.pdb”) It is important to note that non-standard molecules are currently only supported in a full-atom representation. Finally, as the small molecule is modeled as a residue in the Pose object, a residue in the Pose object can be queried as to whether it is a ligand or is one of the standard 20 amino acids. In the example below, residue 105 is a small molecule: >>pose.residue(105).is_protein() <<False >>pose.residue(105).is_ligand() <<True 47 7.8 Modeling DNA and RNA Parameters for all the standard nucleotides in DNA and RNA are already loaded into the minirosetta_database, so modeling DNA and RNA is straightforward in PyRosetta. Each nucleotide in the DNA or RNA is considered a ‘residue’ in the Pose object. Like with small molecules, DNA and RNA is only supported in the full-atom mode, parameters do not currently exist for modeling it in centroid mode. As with the small molecule, a residue can be queried as to whether it is a standard amino acid, or a nucleotide (again, in the case of residue 105): >>pose.residue(105).is_protein() <<False >>pose.residue(105).is_ligand() <<True Note that the names of the atoms in the DNA or RNA have to follow a standardized form. Please see the DNA modeling example in the scripts section of the PyRosetta website (http://www.pyrosetta.org/scripts.html) for more information. 50 Rosetta: Scoring scorefxn = create_score_function('standard') Defines a score function with standard full-atom energy terms and weights scorefxn2=core.scoring.ScoreFunction() scorefxn2.set_weight(core.scoring.fa_atr, 1.0) Defines a function called “scorefxn,” in which the energies accounted for are: The numbers are the relative weights assigned to each energy and can be set to any real value. This is not an inclusive list of energies. print scorefxn Shows score function weights and details scorefxn(pose) Returns the score of pose with the defined function “scorefxn”. scorefxn.show(pose) Returns the score of pose with the defined function “scorefxn”. pose.energies().show() pose.energies().show(resnum) Shows the breakdown of the energies by residue emap = rosetta.core.scoring.TwoBodyEMapVector () Creates an energy map object to store a vector of scores scorefxn.eval_ci_2b(rsd1,rsd2,pose,emap) Evaluates context-independent two-body energies between residues rsd1 and rsd2 and stores the energies in the energy map print emap[rosetta.core.scoring.fa_atr] Print fa_atr term from the energy map hbond_set = rosetta.core.scoring.hbonds.HBondSet() Creates an HBond set object for storing hydrogen bonding information pose.update_residue_neighbors(); rosetta.core.scoring.hbonds.fill_hbond_se t(pose,False,hbond_set) Stores H-bond info from pose in the Hbond_set object. hbond_set.show(pose) Prints H-bond info from the hbond_set calc_total_sasa(pose, 1.5) Calculates the total solvent-accessible surface area using a 1.5A probe Rosetta Full-atom Scoring Functions Van der Waals net attractive energy FA fa_atr Van der Waals net repulsive energy FA fa_rep Hydrogen bonds, short and long-range, (backbone) FA/CEN hbond_sr_bb, hbond_lr_bb Hydrogen bonds, short and long-range, (side-chain) FA hbond_sc, hbond_bb_sc Solvation (Lazaridis-Karplus) FA fa_sol Dunbrack rotamer probability FA fa_dun Statistical residue-residue pair potential FA fa_pair Intra-residue repulsive Van der Waals FA fa_intra_rep Electrostatic potential FA hack_elec Disulfide statistical energies (S-S distance, etc.) FA dslf_ss_dst, dslf_cs_ang, dslf_ss_dih, dslf_ca_dih Amino acid reference energy (chemical potential) FA/CEN ref Statistical backbone torsion potential FA/CEN rama Van der Waals “bumps” CEN vdw Statistical environment potential CEN env Statistical residue-residue pair potential (centroid) CEN pair Cb CEN cbeta 51 52 Residue Type Set Mover switch = SwitchResidueTypeSetMover('centroid') creates a mover which will change poses to the centroid residue type set (‘fa_standard’ also avail.) switch.apply(pose) changes pose to the centroid residue types MoveMap movemap = MoveMap() creates a MoveMap movemap.show(Nres) prints the MoveMap contents for residues 1-Nres movemap.set_bb(True) Allows all backbone torsion angles to vary when movemap is applied movemap.set_chi(True) Allows all side chain torsion angles (χ) to vary when movemap is applied movemap.set_bb(10,False) movemap.set_chi(10,False) Forbid residue 10’s backbone and side chain torsion angles from varying movemap.set_bb_true_range(10,20) Allows backbone torsion angles to vary in residues 10 to 20, inclusive; sets all other residues to False. movemap.set_jump(1, True) Allows jump #1 to be flexible Fragment Movers fragset = ConstantLengthFragSet(3, "aatestA03_05.200_v1_3") creates a fragment set and loads the fragments from the data file mover_3mer = ClassicFragmentMover(fragset,movemap) Creates a fragment mover using the fragset and the movemap mover_3mer.apply(pose) inserts a random 3-mer fragment from the fragset into the pose, only in positions allowed by the movemap smoothmover = SmoothFragmentMover(fragset, movemap) Fragment insertions are followed by a second, downstream fragment insertion chosen to minimize global disruption Small and Shear Movers kT = 1.0 n_moves = 1 smallmover = SmallMover(movemap,kT,n_moves) shearmover = ShearMover(movemap,kT,n_moves) creates a small or shear mover with a movemap, a temperature, and the number of moves smallmover = SmallMover() shearmover = ShearMover() Default settings are all backbone moves allowed, kT = 0.5, and n_moves = 1 smallmover.apply(pose) shearmover.apply(pose) applies the movers Minimize Mover minmover = MinMover() creates a minimize mover with default arguments minmover = MinMover(movemap, scorefxn, min_type, tolerance, True) Creates a minimize mover with a particular MoveMap,ScoreFunction, minimization type, or score tolerance minmover.movemap(movemap) Set a movemap minmover.score_function(scorefxn) Set a scorefunction minmover.min_type('linmin') Set a the minimization type to a line minimization (one direction in the space) minmover.min_type('dfpmin') Set a the minimization type to a David- Fletcher-Powell minimization (multiple iterations of linmin in conjugate directions) 55 minmover = MinMover() minmover.movemap(movemap) dock_lowres = DockingLowRes(scorefxn_low, jump_num) dock_lowres.apply(pose) low-resolution, centroid based MC search (50 RigidBodyPerturbMoves with adaptable step sizes) dock_hires = DockingHighRes(scorefxn_high, jump_num) dock_hires.apply(pose) high-resolution, all-atom based MCM search with rigid-body moves, side- chain packing, and minimization cs = ConformerSwitchMover(start,end, jump_num,scorefxn,"1aaa.pdb") cs.apply(pose) Picks a new backbone conformation from the ensemble (conformer selection docking). start and end indicate residue number range for backbone swapping. randomize1 = RigidBodyRandomizeMover(pose, jump_num, partner_upstream) When applied, globally randomizes the rotation of the upstream partner. randomize2 = RigidBodyRandomizeMover(pose, jump_num, partner_downstream) When applied, globally randomizes the rotation of the downstream partner. DockingProtocol().calc_Lrmsd(pose1, pose2) Calculates RMSD of smaller partner after superposition of larger partner Job Distributor jd = PyJobDistributor("output", 1000, scorefxn_high) Creates a job distributor which will create 1000 model structures named output_1.pdb to output1000.pdb. Files include scorefxn_high energies. Pose native_pose("1aaa.pdb") jd.native_pose = native_pose Sets the native pose (loaded from 1aaa.pdb) for rmsd comparisons jd.job_complete Boolean indicating whether all decoys have been output. jd.output_decoy(pose) Outputs the pose to a file and increments the decoy number. while (jd.job_complete == False): #[create the decoy called pose] jd.output_decoy(pose) Loop to create decoys until all have been output jd.additional_info = "Created by Andy" Sets a string to be output to the pdb file RMSD print CA_rmsd(pose1, pose2) calculates and prints the root-mean- squared deviation of the location of Cα atoms between the two poses Fold Tree ft = FoldTree() ft = pose.fold_tree() Extracts the current fold tree from the pose pose.fold_tree(ft) Attaches the fold tree ft into the pose. ft.add_edge(1,30,-1) Creates a peptide edge (code -1) from residues 1 to 30. This edge will build N-to-C. ft.add_edge(100,31,-1) Creates a peptide edge from residues 100 to 31. This edge will build C-to-N. ft.add_edge(30,100,1) Creates a jump (rigid-body connection) between 56 residues 30 and 100. ft.add_edge(100,101,2) Creates a second jump between residues 100 and 101 (note the jump number is 2. Each jump needs a unique, sequential jump number). ft.check_fold_tree() Returns True only for valid trees. print ft Prints the fold tree ft.simple_tree(100) Creates a single-peptide-edge tree for a 100-residue protein ft.new_jump(40,60,50) Creates a jump from residues 40 to 60, a cutpoint between 50 and 51, and splits up the original edges as needed to finish the tree. ft.clear() Deletes all edges in the fold tree. Loops loop1 = Loop(15,24,20) Defines a loop with stems at residues 15 and 24, and a cut point at residue 20 loops = Loops() loops.add_loop(loop1) Creates an object to contain a set of loops set_single_loop_fold_tree(pose, loop) Sets the pose’s fold tree for single-loop optimization ccd = CcdLoopClosureMover(loop1,movemap) Creates a mover which performs Canutescu & Dunbrack’s cyclic coordinate descent loop closure algorithm loop_refine = LoopMover_Refine_CCD(loops) Creates a high-resolution refinement protocol consisting of cycles of small and shear moves, side- chain packing, CCD loop closure, and minimization. Lrms = loop_rmsd(pose,reference_pose, loops, True) Calculates the rmsd of all loops in the reference frame of the fixed protein structure 57 References and Further Reading
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved