Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

BTEC 3301 Spring 2009 HW01: Sequence Retrieval & Gene Finding, Assignments of Biotechnology

Information about a homework assignment for a btec 3301 course in spring 2009. The assignment focuses on understanding gene and sequence retrieval from biomolecular databases and prokaryotic gene finding. Students are required to use ncbi entrez and srs to find specific information and perform gene finding using various tools such as orf finder, easygene, genemark, glimmer, and fgenesb. The assignment also includes finding promoters and terminators using bprom and findterm.

Typology: Assignments

Pre 2010

Uploaded on 08/19/2009

koofers-user-5sw-2
koofers-user-5sw-2 🇺🇸

10 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download BTEC 3301 Spring 2009 HW01: Sequence Retrieval & Gene Finding and more Assignments Biotechnology in PDF only on Docsity! BTEC 3301, Spring 2009 HOMEWORK 01 (Assigned on 2/26/09) Due: 3/05/09 in class at 3 pm (LATE HW will NOT be accepted) Sequence & Information Retrieval & Prokaryotic Gene Finding This assignment will assess your understanding of searching the biomolecular databases to retrieve information and sequences, and for finding genes in the prokaryotic genome. What to submit: Hard Copy of your answers, include your Name & student ID IMP Note: Please include the DATE and TIME of when you performed the searches to get your answers. Points will be deducted if you fail to do so. 1. Using NCBI ENTREZ, select the GENOME database, and find out the general molecular function of aroE in the species Agrobacterium tumefaciens str. C58. What is the function? 2. Using NCBI ENTREZ, find out on which chromosome in Drosphila melanogaster do the genes amnesiac (amn) and dunce (dnc) lie? In which, biological processes are they involved? 3. Using NCBI ENTREZ, find the list of extinct organisms archived. Go to extinct Insects and select ‘Libanorhinus succinus (a beetle from Lebanese amber 120-1135 Mya). What is the ancestry of this organism ? How many nucleotide sequences are listed ? What is the name of the gene on the sequence? 4. Using SRS answer the following. Explain the search terms and combinations you use for each question. a. How many Bacillus subtilis sequences does SwissProt contain? b. How many of these are not hypothetical? c. How many from (a) are signal peptidases from Bacillus subtilis in SwissProt. d. How many from (c) are Bacillus subtilis signal peptidase I sequences. 5. Prokaryotic Gene Finding: The objective of this exercise is to develop a critical attitude towards annotations and gene finders. Even though gene finding in prokaryotes is very simple compared to eukaryotic gene finding there are multiple things that can go wrong. Use the sequence from a E. Coli plasmid available on the class web page (seqHW02.fasta). Generally you expect genes in E. coli to be in the range 100 - 500 amino acids. I. Run the NCBI ORF Finder (a) How many ORFs can you find that > 300 nts in length? (b) What are their start coordinates? Do they all start with Methionine (Met, M)? (c) Can you identify any long ORFs? (d) How do you tell which of these ORFs - are real protein coding genes? II. Gene Finding You are going to use four prokaryotic gene finders; EasyGene, GeneMark, Glimmer and FGeneB. These gene finders use different Hidden Markov Models to predict the location of the genes. All programs are organism specific trained. This helps in finding the correct amino acid frequencies for the organism. A number of well defined genes are used to train the codon statistics for the gene finder. The coding models define different amino acid distribution found during the training. You will need to compare the results from the ORF Finder and all the four Gene Finding programs. Use the table (GeneTablePro.doc) provided on the course webpage to record your results. (A) Run EasyGene: http://www.cbs.dtu.dk/services/EasyGene/ Pick the appropriate organism and use all default settings. The results file format is known as GFF (Gene Finder Format). The last column gives the logOdds score that is calculated directly from the Markov Model; a high score represent a high probability for this sequence in the given model. The logOdds score is highly dependent on the length of the predicted gene, the longer the gene the better the score. The score is therefore ranked by statistical significance to avoid the length skew: The R-score is the ranked score, which should be as low as possible for potential coding regions. How many genes were found? Which do you think are most likely to be real protein coding genes? (B) Run GeneMark: http://opal.biology.gatech.edu/GeneMark/gmhmm2_prok.cgi Pick the appropriate organism and settings. The output shows the position of the genes and states a class. The class shows which model (Typical/Atypical) that have been used to identify the gene. The "Typical model" and "Atypical model" define different length distributions of the genes. Generally you expect genes in E. coli to be in the range 100 - 500 amino acids. The atypical distribution is likely to model short genes that would not be found with the typical model, since long genes tend to score higher in an HMM. That one model has been used does not necessarily mean that the gene would not have been found by the other model, it might just get a lower score. How many genes were found? Which do you think are most likely to be real protein coding genes? (C) Run Glimmer: http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi Pick the appropriate organism and use all default settings. For more information on glimmer visit http://egg.isu.edu/biocourses/bios599/projects/Eric_html How many genes were found? Which do you think are most likely to be real protein coding genes? (D) Run FGENESB: http://linux1.softberry.com/berry.phtml?topic=fgenesb&group=programs&subgroup=gfindb
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved