Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Introduction to Bioinformatics: Accessing the NCBI Databases | BI 203, Assignments of Microbiology

Material Type: Assignment; Professor: Lester; Class: MICROBIOLOGY; Subject: Biological Sciences; University: Montgomery College; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-vcn
koofers-user-vcn 🇺🇸

2

(1)

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Introduction to Bioinformatics: Accessing the NCBI Databases | BI 203 and more Assignments Microbiology in PDF only on Docsity! Introduction to Bioinformatics: Accessing the NCBI Databases By Shawn Lester (12 points) The purpose of this exercise is to introduce to you and familiarize you with the National Center for Biotechnology Information (NCBI) web sites. These web sites contain nucleotide and protein databases containing a vast amount of sequencing data that is increasing every single day. As more and more genes, genomes, and proteins are being sequenced by scientists from all over the world, these databases represent a central repository for all of these data which can be accessed by anyone on the planet. You will be one of these people and you will search the databases to find homologous amino acid and nucleotide sequences in order to identify an unknown bacterium. To start with, your instructor will provide you with an unknown sequence (either amino acids or nucleotides) so that you may become familiar with how to search the databases. Later, you may actually be isolating and sequencing genes from living organisms! (To be decided later) The first thing to do is to go to the NCBI web site: http://www.ncbi.nlm.nih.gov/sites/entrez . As you will see, there is a tremendous amount of information located here and it is not that easy to interpret. We are not trying to make you sequencing experts. We want you to become familiar with the research tools that are available and hopefully you will begin to see the potential of what can be learned from these genetic data. When we perform our searches for homologous sequences, we will use the default settings. If you were an expert, you could change a variety of criteria when performing your searches. You should also know that the algorithms used to align the various sequences are not the only ones but are the most commonly used algorithms. When you go to the web site, take some time to look around. Part 1: You will be given either a nucleotide sequence or an amino acid sequence. These will either be emailed to you or handed to you. If you have to type in your sequence, don’t worry. Typos are not that critical. NOTE: Mozilla Firefox web browser does not work well for making distance trees. Use Internet Explorer instead. 1. Go to the NCBI web site: http://www.ncbi.nlm.nih.gov/sites/entrez 2. We are going to start practicing with an amino acid sequence for an unknown protein. The letters used may not be familiar to you. See the reference at the end of this handout for an explanation of amino acid abbreviations. Sequence: DLEEEIYMEQPEGFKVPGKEGLVCHLTKSLYGLKQAPRQWYKKFDAFMAEHDFKKTESDHCVFIKRYVSGDFLILLLY 3. Near the top of the web page there is a black bar that starts with All Databases. Click on the word “Protein”. When you get to next screen notice that if you were searching for a particular protein, you could enter the name of protein at the top in the search box and this would produce a list of sequences and the organisms with that particular protein which have been entered into the database so far. 4. On the left hand side is a blue column. Find the word BLAST and click on it. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and 1 evolutionary relationships between sequences as well as help identify members of gene families. (NCBI web site, 2008) 5. Scroll down the page and click on “protein Blast”. You are now on the Blast page where you can perform your alignment searches. Scroll down near the bottom and check the box that says “Show results in a new window.” This makes things easier when performing repeated searches. 6. On the BLAST page, type or ‘copy and paste’ your amino acid sequence into the box under “Enter Query Sequence”. Scroll to the bottom and click on BLAST button. Your alignment results will appear in a new window. This may take several minutes or longer depending on the complexity of the sequence, user traffic, and the time of day. 7. As you scroll down the page, you will a great deal of information. Answer the following questions: (5 points) (The answers to b, c, and e require you to look them up elsewhere.) a. How many ‘Blast Hits’ were found in response to your search query?_______________ b. What enzyme was encoded by the amino acid sequence you searched?_____________ c. What is the function of this enzyme? _________________________________________ d. What is the first organism listed that matches your search?_______________________ e. What is the common name for this genus and species? (If you don’t know, look it up on the internet.) ________________________________ 8. Click on the “Descriptions” hyperlink. This expands the information for each listed organism produced by the search. 9. Check the little box beside the name of 5 different organisms in the list. Example: > emb|CAA11068.1| reverse transcriptase [Allium cepa] 10. Scroll up a little and find the hyperlink Distance tree results and click on it. This will produce a phylogenetic tree show distance (i.e. relatedness) of the five organisms you selected. Clicking on “[Distance tree results]” at the top of the page automatically selects all 100 organisms which are displayed in a very complicated tree. 11. On the right hand side of the tree next to “Collapse Mode”, select ‘Show All’. When you make your tree, there are several ways to view it. Choosing “slanted” at the top of the tree (rectangle, slanted, radial, force), produces a tree that may be easier to interpret. 12. Right click on the phylogenetic tree, select “Save Picture As…” and save it to somewhere like the Desktop. Open the picture on your Desktop, then either print it or ‘copy and paste’ it to the document. (You may do this by copying and pasting the icon on the Desktop.) 13. See example tree on next page. 2 You now have the basic skill necessary to search for homologous protein or gene sequences. You can now apply these skills to help you identify your unknown bacteria. (To be decided later) 5 References/Supplemental Material Amino Acid Abbreviations: A alanine P proline B aspartate/asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate/glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length Values: The E-value (expected) = a number that relates to an alignment match occurring by chance. Lower E-values represent better or more significant matches. Terms: Homology = Similarity attributed to descent from a common ancestor. Orthology and Paralogy: Homologous sequences. Orthologs and Paralogs are two types of homologous sequences. Orthology describes genes in different species that derive from a common ancestor. Orthologous genes may or may not have the same function. Paralogy describes homologous genes within a single species that diverged by gene duplication. (NCBI web site, 2008) 6 Understanding phylogenies (borrowed from evolution.berkeley.edu/evolibrary, 2008) Understanding a phylogeny is a lot like reading a family tree. The root of the tree represents the ancestral lineage, and the tips of the branches represent the descendents of that ancestor. As you move from the root to the tips, you are moving forward in time. When a lineage splits (speciation), it is represented as branching on a phylogeny. When a speciation event occurs, a single ancestral lineage gives rise to two or more daughter lineages. Phylogenies trace patterns of shared ancestry between lineages. Each lineage has a part of its history that is unique to it alone and parts that are shared with other lineages. Similarly, each lineage has ancestors that are unique to that lineage and ancestors that are shared with other lineages — common ancestors. A clade is a grouping that includes a common ancestor and all the descendents (living and extinct) of that ancestor. Using a phylogeny, it is easy to tell if a group of lineages forms a clade. Imagine clipping a single branch off the phylogeny — all of the organisms on that pruned branch make up a clade. 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved