Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding the Role of Proteins, DNA, and RNA in Gene Expression - Prof. Tompa, Lecture notes of Molecular biology

An in-depth exploration of the roles of proteins, dna, and rna in gene expression. It delves into the structure and function of these cellular molecules, the process of transcription and translation, and the regulation of gene expression. The document also discusses the differences between prokaryotic and eukaryotic gene structure and genome organization, and the importance of computer science and mathematics in analyzing sequence data.

Typology: Lecture notes

Pre 2010

Available from 05/02/2024

FATTOUH
FATTOUH 🇺🇸

4.3

(3)

202 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding the Role of Proteins, DNA, and RNA in Gene Expression - Prof. Tompa and more Lecture notes Molecular biology in PDF only on Docsity! Basics of Molecular Biology Martin Tompa Department of Computer Science and Engineering Department of Genome Sciences University of Washington Seattle, WA 98195-2350 U.S.A. July 6, 2003 Updated December 18, 2009 We begin with a review of the basic molecules responsible for the functioning of all organisms’ cells. Much of the material here comes from the introductory textbooks by Drlica [4], Lewin [7], and Watson et al. [10]. Good short primers have been written by Hunter [6] and Brāzma et al. [2]. What sorts of molecules perform the required functions of the cells of organisms? Cells have a basic tension in the roles they need those molecules to fulfill: 1. The molecules must perform the wide variety of chemical reactions necessary for life. To perform these reac- tions, cells need diverse three-dimensional structures of interacting molecules. 2. The molecules must pass on the instructions for creating their constituent components to their descendents. For this purpose, a simple one-dimensional information storage medium is the most effective. We will see that proteins provide the three-dimensional diversity required by the first role, and DNA provides the one-dimensional information storage required by the second. Another cellular molecule, RNA, is an intermediary between DNA and proteins, and plays some of each of these two roles. 1 Proteins Proteins have a variety of roles that they must fulfill: 1. They are the enzymes that rearrange chemical bonds. 2. They carry signals to and from the outside of the cell, and within the cell. 3. They transport small molecules. 4. They form many of the cellular structures. 5. They regulate cell processes, turning them on and off and controlling their rates. This variety of roles is accomplished by the variety of proteins, which collectively can assume a variety of three- dimensional shapes. 1 A protein’s three-dimensional shape, in turn, is determined by the particular one-dimensional composition of the protein. Each protein is a linear sequence made of smaller constituent molecules called amino acids. The constituent amino acids are joined by a “backbone” composed of a regularly repeating sequence of bonds. (See [7, Figure 1.4].) There is an asymmetric orientation to this backbone imposed by its chemical structure: one end is called the N-terminus and the other end the C-terminus. This orientation imposes directionality on the amino acid sequence. There are 20 different types of amino acids. The three-dimensional shape the protein assumes is determined by the specific linear sequence of amino acids from N-terminus to C-terminus. Different sequences of amino acids fold into different three-dimensional shapes. (See, for example, [1, Figure 1.1].) Protein size is usually measured in terms of the number of amino acids that comprise it. Proteins can range from fewer than 20 to more than 5000 amino acids in length, although an average protein is about 350 amino acids in length. Each protein that an organism can produce is encoded in a piece of the DNA called a “gene” (see Section 6). To give an idea of the variety of proteins one organism can produce, the single-celled bacterium E. coli has about 4300 different genes. Humans are believed to have about 25,000 different genes (the exact number as yet unresolved), so a human has only about 6 times as many genes as E. coli. The number of proteins that can be produced by humans greatly exceeds the number of genes, however, because a substantial fraction of the human genes can each produce many different proteins through a process called “alternative splicing”. 1.1 Classification of the Amino Acids Each of the 20 amino acids consists of two parts: 1. a part that is identical among all 20 amino acids; this part is used to link one amino acid to another to form the backbone of the protein. 2. a unique side chain (or “R group”) that determines the distinctive physical and chemical properties of the amino acid. Although each of the 20 different amino acids has unique properties, they can be classified into four categories based upon their major chemical properties. Below are the names of the amino acids, their 3 letter abbreviations, and their standard one letter symbols. 1. Positively charged (and therefore basic) amino acids (3). Arginine Arg R Histidine His H Lysine Lys K 2. Negatively charged (and therefore acidic) amino acids (2). Aspartic acid Asp D Glutamic acid Glu E 3. Polar amino acids (7). Though uncharged overall, these amino acids have an uneven charge distribution. Because of this uneven charge distribution, these amino acids can form hydrogen bonds with water. As a consequence, polar amino acids are often found on the outer surface of folded proteins, in contact with the watery environment of the cell, in which case they are called hydrophilic. 2 4 Residues The term residue refers to either a single base constituent from a nucleotide sequence, or a single amino acid con- stituent from a protein. This is a useful term when one wants to speak collectively about these two types of biological sequences. 5 DNA Replication What is the purpose of double-strandedness in DNA? One answer is that this redundancy of information is key to how the one-dimensional instructions of the cell are passed on to its descendant cells. During the cell cycle, the DNA double strand is split into its two separate strands. As it is split, each individual strand is used as a template to synthesize its complementary strand, to which it hybridizes. (See [4, Figure 5-2 and 5-1].) The result is two exact copies of the original double-stranded DNA. In more detail, an enzymatic protein called DNA polymerase splits the DNA double strand and synthesizes the complementary strand of DNA. It synthesizes this complementary strand by adding free nucleotides available in the cell onto the 3′ end of the new strand being synthesized [4, Figure 5-3]. The DNA polymerase will only add a nucleotide if it is complementary to the opposing base on the template strand. Because the DNA polymerase can only add new nucleotides to the 3′ end of a DNA strand (i.e., it can only synthesize DNA in the 5′ to 3′ direction), the actual mechanism of copying both strands is somewhat more complicated. One strand can be synthesized continuously in the 5′ to 3′ direction. The other strand must be synthesized in short 5′-to-3′ fragments. Another enzymatic protein, DNA ligase, glues these synthesized fragments together into a single long DNA molecule. (See [4, Figure 5-4].) 6 Synthesis of RNA and Proteins The one-dimensional storage of DNA contains the information needed by the cell to produce all its RNA and proteins. In this section, we describe how the information is encoded, and how these molecules are synthesized. Proteins are synthesized in a two-step process. First, an RNA “copy” of a portion of the DNA is synthesized in a process called transcription, described in Section 6.1. Second, this RNA sequence is read and interpreted to synthesize a protein in a process called translation, described in Section 6.2. Together, these two steps are called gene expression. A gene is a sequence of DNA that encodes a protein or an RNA molecule. Gene structure and the exact expression process are somewhat dependent on the organism in question. The prokaryotes, which consist of the bacteria and the archaea, are single-celled organisms lacking nuclei. Because prokaryotes have the simplest gene structure and gene expression process, we will start with them. The eukaryotes, which include plants and animals, have a somewhat more complex gene structure that we will discuss after. 6.1 Transcription in Prokaryotes How do prokaryotes synthesize RNA from DNA? This process, called transcription, is similar to the way DNA is replicated (Section 5). An enzyme called RNA polymerase, copies one strand of the DNA gene into a messenger RNA (mRNA), sometimes called the transcript. The RNA polymerase temporarily splits the double-stranded DNA, and uses one strand as a template to build the complementary strand of RNA. (See [4, Figure 4-1].) It incorporates U opposite A, A opposite T, G opposite C, and C opposite G. The RNA polymerase begins this transcription at a short DNA pattern it recognizes called the transcription start site. When the polymerase reaches another DNA sequence called the transcription stop site, signalling the end of the gene, it drops off. 5 6.2 Translation How is protein synthesized from mRNA? This process, called translation, is not as simple as transcription, because it proceeds from a 4 letter alphabet to the 20 letter alphabet of proteins. Because there is not a one-to-one correspondence between the two alphabets, amino acids are encoded by consecutive sequences of 3 nucleotides, called codons. (Taking 2 nucleotides at a time would give only 42 = 16 possible permutations, whereas taking 3 nucleotides yields 43 = 64 possible permutations, more than sufficient to encode the 20 different amino acids.) The decoding table is given in Table 1, and is called the genetic code. It is rather amazing that this same code is used almost universally by all organisms. U C A G U UUU Phe [F] UUC Phe [F] UUA Leu [L] UUG Leu [L] UCU Ser [S] UCC Ser [S] UCA Ser [S] UCG Ser [S] UAU Tyr [Y] UAC Tyr [Y] UAA STOP UAG STOP UGU Cys [C] UGC Cys [C] UGA STOP UGG Trp [W] U C A G C CUU Leu [L] CUC Leu [L] CUA Leu [L] CUG Leu [L] CCU Pro [P] CCC Pro [P] CCA Pro [P] CCG Pro [P] CAU His [H] CAC His [H] CAA Gln [Q] CAG Gln [Q] CGU Arg [R] CGC Arg [R] CGA Arg [R] CGG Arg [R] U C A G A AUU Ile [I] AUC Ile [I] AUA Ile [I] AUG Met [M] ACU Thr [T] ACC Thr [T] ACA Thr [T] ACG Thr [T] AAU Asn [N] AAC Asn [N] AAA Lys [K] AAG Lys [K] AGU Ser [S] AGC Ser [S] AGA Arg [R] AGG Arg [R] U C A G G GUU Val [V] GUC Val [V] GUA Val [V] GUG Val [V] GCU Ala [A] GCC Ala [A] GCA Ala [A] GCG Ala [A] GAU Asp [D] GAC Asp [D] GAA Glu [E] GAG Glu [E] GGU Gly [G] GGC Gly [G] GGA Gly [G] GGG Gly [G] U C A G Table 1: The Genetic Code There is a necessary redundancy in the code, since there are 64 possible codons and only 20 amino acids. Thus each amino acid (with the exceptions of Met and Trp) is encoded by synonymous codons, which are interchangeable in the sense of producing the same amino acid. Only 61 of the 64 codons are used to encode amino acids. The remaining 3, called STOP codons, signify the end of the protein. Ribosomes are the molecular structures that read mRNA and produce the encoded protein according to the genetic code. Ribosomes are large complexes consisting of both proteins and a type of RNA called ribosomal RNA (rRNA). The process by which ribosomes translate mRNA into protein makes use of yet a third type of RNA called transfer RNA (tRNA). There are 61 different transfer RNAs, one for each nontermination codon. Each tRNA folds (see Section 3) to form a cloverleaf-shaped structure. This structure produces a pocket that complexes uniquely with the amino acid encoded by the tRNA’s associated codon, according to Table 1. The unique fit is accomplished analogously to a key and lock mechanism. Elsewhere on the tRNA is the anticodon, three consecutive bases that are complementary and antiparallel to the associated codon, and exposed for use by the ribosome. The ribosome brings together each codon of the mRNA with its corresponding anticodon on some tRNA, and hence its encoded amino acid. (See [4, Figure 4-4].) In prokaryotes, which have no cell nucleus, translation begins while transcription is still in progress, the 5′ end of the transcript being translated before the RNA polymerase has transcribed the 3′ end. (See Drlica [4, Figure 4-4].) In eukaryotes, the DNA is inside the nucleus, whereas the ribosomes are in the cytoplasm outside the nucleus. Hence, transcription takes place in the nucleus, the completed transcript is exported from the nucleus, and translation then takes place in the cytoplasm. The ribosome forms a complex near the 5′ end of the mRNA, binding around the start codon, also called the 6 translation start site. The start codon is most often 5′-AUG-3′, and the corresponding anticodon is 5′-CAU-3′. (Less often, the start codon is 5′-GUG-3′or 5′-UUG-3′.) The ribosome now brings together this start codon on the mRNA and its exposed anticodon on the corresponding tRNA, which hybridize to each other. (See [4, Figure 4-4].) The tRNA brings with it the encoded amino acid; in the case of the usual start codon 5′-AUG-3′, this is methionine. Having incorporated the first amino acid of the synthesized protein, the ribosome shifts the mRNA three bases to the next codon. A second tRNA complexed with its specific amino acid hybridizes to the second codon via its anticodon, and the ribosome bonds this second amino acid to the first. At this point the ribosome releases the first tRNA, moves on to the third codon, and repeats. (See [4, Figure 4-5].) This process continues until the ribosome detects one of the STOP codons, at which point it releases the mRNA and the completed protein. 7 Prokaryotic Gene Structure Recall from Section 6 that a gene is a relatively short sequence of DNA that encodes a protein or RNA molecule. In this section we restrict our attention to protein-coding genes in prokaryotes. The portion of the gene containing the codons that ultimately will be translated into the protein is called the coding region, or open reading frame. The transcription start site (see Section 6.1) is somewhat upstream from the start codon, where “upstream” means “in the 5′ direction”. Similarly, the transcription stop site is somewhat downstream from the stop codon, where “downstream” means “in the 3′ direction”. That is, the mRNA transcript contains sequence at both its ends that has been transcribed, but will not be translated. The sequence between the transcription start site and the start codon is called the 5 ′ untranslated region. The sequence between the stop codon and the transcription stop site is called the 3 ′ untranslated region. Upstream from the transcription start site is a relatively short sequence of DNA called the regulatory region or promoter region. It contains regulatory elements, which are specific DNA sites where certain regulatory proteins bind and regulate expression of the gene. These proteins are called transcription factors, since they regulate the transcription process. A common way in which transcription factors regulate expression is to bind to the DNA at a promoter and from there affect the ability (either positively or negatively) of RNA polymerase to perform its task of transcription. (There is also the analogous possibility of translational regulation, in which regulatory factors bind to the mRNA and affect the ability of the ribosome to perform its task of translation.) 8 Prokaryotic Genome Organization The genome of an organism is the entire complement of DNA in any of its cells. In prokaryotes, the genome typically consists of a single chromosome of double-stranded DNA, and it is often circularized (its 5′ and 3′ ends attached) as opposed to being linear. A typical prokaryotic genome size would be in the millions of base pairs. Typically 85% of the prokaryotic genome consists of protein-coding regions. For instance, the E. coli genome has size about 5 Mb and approximately 4300 coding regions, each of average length around 1000 bp. The genes are relatively densely and uniformly distributed throughout the genome. 9 Eukaryotic Gene Structure An important difference between prokaryotic and eukaryotic genes is that the latter may contain “introns”. In more detail, the transcribed sequence of a general eukaryotic gene is an alternation between DNA sequences called exons and introns, where the introns are sequences that ultimately will be spliced out of the mRNA before it leaves the nucleus. Transcription in the nucleus produces an RNA molecule called pre-mRNA, produced as described in Section 6.1, that contains both the exons and introns. The introns are spliced out of the pre-mRNA by structures called spliceosomes to produce the mature mRNA that will be transported out of the nucleus for translation. A eukaryotic 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved