Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Bioinformatics: Techniques & Tools for Sequence Similarity Searches in Databases, Study notes of Bioinformatics

George Mason University (GMU)Bioinformatics

Various techniques and tools for database searching in bioinformatics, with a focus on sequence similarity searches. Topics include different types of searches, the efficacy of protein searches, historical background, and specific search tools such as fasta and blast. The document also covers the significance of search results and various versions of fasta and blast.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-r2v 🇺🇸

10 documents

1 / 11

Partial preview of the text

Download Bioinformatics: Techniques & Tools for Sequence Similarity Searches in Databases and more Study notes Bioinformatics in PDF only on Docsity! 1 Lecture 9 Database Searching Database Searching for Similar Sequences • Database searching for similar sequences is ubiquitous in bioinformatics. • Databases are large and getting larger • Need fast methods 2 Types of Searches • Sequence similarity search with query sequence • Alignment search with profile (scoring matrix with gap penalties) • Serch with position-specific scoring matrix representing ungapped sequence alignment • Iterative alignment search for similar sequences that starts with a query sequence, builds a multiple alignmnet, and then uses the alignment to augment the search • Search query sequence for patterns representative of protein families From Bioinformatics by Mount DNA vs Protein Searches • DNA sequences consists of 4 characters (nucleotides) • Protein sequences consist of 20 characters (amino acids) • Hence, it is easier to detect patterns in protein sequences than DNA sequences • Better to convert DNA sequences to protein sequences for searches. 5 Search Tools • Similarity Search Tools – Smith-Waterman Searching • Heuristic Search Tools – FASTA – BLAST Dynamic Programming • Use Smith-Waterman algorithm or an improvement thereof for local alignment. • Compares individual characters in the full- length sequence • Slower but more sensitive than FASTA or BLAST • Finds optimal Alignment 6 FASTA • Fast alignment of pairs of protein or DNA sequences • Searches for matching sequence patterns or words called k-tuples corresponding to k consecutive matches in both sequences • Local alignments are build based on these matches. • Better for DNA searches than BLAST (k-tuple can be smaller than minimum of 7 for BLAST) • No guarantee of finding exactly optimal alignment FASTA Algorithm • FASTA uses a search for regions of similarity by hashing • In hashing, a lookup table showing the positions of each k-tuple is made for each sequence • The relative positions of the k-tuple in each sequence are calculated by subtracting the postions of the first characters • K-tuples having the same offset are considered to be aligned. • Adjacent regions are joined if possible by inserting gaps. • The highest scoring regions are aligned by dynamics programming 7 FASTA Algorithm • The number of comparisons increases linearly with the average sequence length • In dynamic programming and dot plots, the number of comparisons increases as the cube or square of the length, respectively. Significance of FASTA Searches • The average score is plotted against the log of the average sequence length in each length range. • A line is fit with linear regression and the z- score is the number of standard deviations from the fitted line. • A statistical distribution of alignment scores can be used to determine probabilities. 10 BLAST Algorithm • The alignments are extended as long as the similarity score increases and if overlap, they are combined. • These high-scoring segment pairs are matched in the entire database and listed • The statistical significance for these are calculated Database Searching with a Scoring Matrix or Profile • A combination of dynamic programming, genetic algorithms or hidden Markov models can be used to extract patterns from a multiple sequence alignment • Pattern finding and statistical methods (expectation minimization and Gibbs samplng) can be used also • Example: PROFILE HMM 11 Database Searching with a Position Specific Scoring Matrix • The previous method can be used to make a position specific scoring matrix. • The position specific scoring matrix is moved along the sequence to score every possible sequence position in the query sequence. • The highest scoring positions are typically the best matches for the corresponding set of sequences in the database • Examples: EMOTIF, MOTIF, PHI-BLAST, BLOCKS, Profilesearch

Documents

questions

Bioinformatics: Techniques & Tools for Sequence Similarity Searches in Databases, Study notes of Bioinformatics

Related documents

Partial preview of the text