Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Database Searching - Lab 3 - Fall 2007 | BCB 444, Lab Reports of Bioinformatics

Material Type: Lab; Class: INTRO BIOINFORMATCS; Subject: BIOINFORMATICS AND COMPUTATIONAL BIOL; University: Iowa State University; Term: Fall 2007;

Typology: Lab Reports

Pre 2010

Uploaded on 09/02/2009

koofers-user-7h5
koofers-user-7h5 🇺🇸

4.5

(2)

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download Database Searching - Lab 3 - Fall 2007 | BCB 444 and more Lab Reports Bioinformatics in PDF only on Docsity! BCB 444/544 Fall 07 Sep 6 Lab 3 p. 1 BCB 444/544 Lab 3 Database Searching Due Mon Sept 10 by 5 PM - email to terrible@iastate.edu Objectives 1. Use BLAST and Smith-Waterman programs to retrieve related sequences from a database 2. Use PSI-BLAST to detect distantly related sequences Introduction This lab has been designed to introduce you to many aspects of sequence alignment. You will become familiar with topics such as: finding similar sequences in a database, how to assess if the sequence "hits" found in a database are relevant. After completing this lab and related reading assignments, you should be able to answer the question: Why do we align sequences anyway? This lab deals with several topics that we have not yet covered in lecture. I will attempt to introduce you to some of the basics before we start the exercises. If you are a biologist, it is likely that you have already used many of the programs introduced in this lab without knowing exactly what they are doing. Unfortunately, we will not cover the details of the algorithms in the lab; we will save that task for lecture. The lab is designed to give you some practice with using the programs and interpreting the results. Exercises Part I We will use a query sequence to search against a database. We will use both the Smith-Waterman algorithm and BLAST. Both programs perform local alignments but Smith-Waterman performs an exhaustive search while BLAST uses heuristics (basically, shortcuts) to reduce the time required to perform the search. Go to Biology Workbench and log in. If you do not have an account yet, take a minute and set one up. Click on the Protein Tools button then run the Add New Protein Sequence tool. Enter this sequence: NICKECPIIGFRYRSLKHFNYDICQSCFF and click on the Save button. You should now see your newly entered sequence identifier with a checkbox next to it. Select the checkbox and run the SSEARCH program. This program may take a while to run, so be sure to check the Run as batch checkbox near the top of the page. Select the Non-Redundant Protein Database (SDSC) line for the database to search against. Leave all other parameters in their default settings and click on the Submit button. You will have to check back later for the results of this search. To see if the results are available, go to the Protein Tools page, choose Retrieve BATCH Output, and click on the Run button. Save your results. Next, run a BLAST search against the same database with the same sequence as the query. Make sure the checkbox next to the sequence is checked and choose the BLASTP program and click on the Run button. Select the Non-Redundant Protein Database (SDSC) line and accept the default settings. Click on the Submit button to start your search. BCB 444/544 Fall 07 Sep 6 Lab 3 p. 2 Save your results. 1a. How do the results of the SSEARCH and BLAST searches compare? b. Did they find the same hits in the database? c. Are the alignments the same? d. Which program would you use more often and why? Part II Go to the NCBI website and go through the BLAST and PSI-BLAST tutorials. Even if you have used BLAST before, you will probably learn something new by doing the tutorials. They contain a lot of good information about how to formulate a query, what options are available, how to format output, and how to analyze your search results. Here are the links to the tutorials: BLAST Tutorial PSI-BLAST Tutorial Statistics of Sequence Similarity Scores 2. What does an E-value of 2 mean? Our first BLAST search will be to determine the type of protein represented by the sequence below. This sequence was generated by translating a 5 exon gene from Drosophila. Go to NCBI and determine the nature of this protein, run a blastp search against the Swissprot database. Use the default parameters for the search. > test protein 1 MSQICKRGLLISNRLAPAALRCKSTWFSEVQMGPPDAILGVTEAFKKDTNPKKINLGAGAYRDDNTQPFVLPSVREAEKRVVSRSLDKEY ATIIGIPEFYNKAIELALGKGSKRLAAKHNVTAQSISGTGALRIGAAFLAKFWQGNREIYIPSPSWGNHVAIFEHAGLPVNRYRYYDKDT CALDFGGLIEDLKKIPEKSIVLLHACAHNPTGVDPTLEQWREISALVKKRNLYPFIDMAYQGFATGDIDRDAQAVRTFEADGHDFCLAQS FAKNMGLYGERAGAFTVLCSDEEEAARVMSQVKILIRGLYSNPPVHGARIAAEILNNEDLRAQWLKDVKLMADRIIDVRTKLKDNLIKLG SSQNWDHIVNQIGMFCFTGLKPEQVQKLIKDHSVYLTNDGRVSMAGVTSKNVEYLAESIHKVTK 3. What is this protein? One of the problems with BLAST these days is that it is just too darn good. The databases contain so many sequences that often your BLAST results are just a huge collection of identical, or nearly identical sequences (you may have noticed this from your BLAST and SSEARCH results in the previous section). This problem has been designed to challenge BLAST with a difficult problem. In this exercise we will try to find a bacterial match for the following nucleotide sequence: >gi|76828014|gb|BC107078.1| Homo sapiens G protein-coupled receptor, family C, group 5, member D, mRNA (cDNA clone MGC:129714 IMAGE:40027066), complete cds ATGTACAAGGACTGCATCGAGTCCACTGGAGACTATTTTCTTCTCTGTGACGCCGAGGGGCCATGGGGCA TCATTCTGGAGTCCCTGGCCATACTTGGCATCGTGGTCACAATTCTGCTACTCTTAGCATTTCTCTTCCT CATGCGAAAGATCCAAGACTGCAGCCAGTGGAATGTCCTCCCCACCCAGCTCCTCTTCCTCCTGAGTGTC CTGGGGCTCTTCGGACTCGCTTTTGCCTTCATCATCGAGCTCAATCAACAAACTGCCCCCGTACGCTACT TTCTCTTTGGGGTTCTCTTTGCTCTCTGTTTCTCATGCCTCTTAGCTCATGCCTCCAATCTAGTGAAGCT GGTTCGGGGTTGTGTCTCCTTCTCCTGGACGACAATTCTGTGCATTGCTATTGGTTGCAGTCTGTTGCAA ATCATTATTGCCACTGAGTATGTGACTCTCATCATGACCAGAGGTATGATGTTTGTGAATATGACACCCT GCCAGCTCAATGTGGACTTTGTTGTACTCCTGGTCTATGTCCTCTTCCTGATGGCCCTCACATTCTTCGT
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved