Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Database Searching for Similar Sequences: Techniques, Algorithms, and Tools, Study notes of Bioinformatics

George Mason University (GMU)Bioinformatics

Various techniques for database searching for similar sequences in bioinformatics. It covers different types of searches, the differences between dna and protein searches, and the efficacy of protein searches. The history of database sequence similarity searching is also explored, focusing on dynamic programming, fasta, and blast. The significance of fasta searches and the versions of fasta are also discussed.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-xty-1 🇺🇸

10 documents

1 / 4

Related documents

Bioinformatics: Techniques & Tools for Sequence Similarity Searches in Databases

BLAST Database Searching: Understanding BLAST Tools and Techniques for Biology Students

Comparative Genomics: Pairwise Alignment and Similarity Searching in Experimental Biology

Web Searching: Challenges and Techniques - Prof. Amol V. Deshpande

Database Searching - Computational Biology Tools | BME 110

Sorting and Searching Techniques in Database Management

Basics and Background - Indexing and Searching Techniques - Lecture Slides

Working with Corpora: Searching and Querying Tools and Techniques

Data Clustering: Techniques, Examples, and Algorithms

Arrays and Searching Techniques in C: Sequential and Binary Search Algorithms

Identifying Coding Sequences in Genomic DNA: Approaches and Tools

Entrez Assignment: Searching NCBI Databases

19.2 Greedy Algorithms: Tools and Techniques

Database Searching and Gene Prediction Assignment

Parallel Searching Algorithms in Computing

Data Structures and Algorithms: Searching and Hashing

Searching Techniques: Backtracking and Branch and Bound Algorithms - Prof. Eric Torng

Database Query Optimization: Techniques and Algorithms

Sequence Alignment and Similarity Searching: BTEC 3301 Homework 03 Solution

Exam Recap - Bioinformatic Algorithms, Databases and Tools | CMSC 423

Slides on Administrivia - Bioinformatics Algorithms, Databases, and Tools | CMSC 433

Image Processing: Techniques, Algorithms, and Tools - Prof. Juraj Obert

Midterm Exam Recap - Bioinformatic Algorithms, Database and Tools | CMSC 423

RNA Folding – Bioinformatic Algorithms, Databases and Tools - Notes | CMSC 423

(1)

Notes for Homework 2 - Bioinformatics Algorithms, Databases, and Tools | CMSC 423

Database Development: Phases, Tools, and Techniques - Prof. Bernard T. Han

Indexing and Searching Techniques: Single and Multilevel Indexes

M-trees - Indexing and Searching Techniques - Lecture Slides

P1 Design a relational database system using appropriate design tools and techniques, co

Partial preview of the text

Download Database Searching for Similar Sequences: Techniques, Algorithms, and Tools and more Study notes Bioinformatics in PDF only on Docsity! 1 Lecture 9 Database Searching Database Searching for Similar Sequences • Database searching for similar sequences is ubiquitous in bioinformatics. • Databases are large and getting larger • Need fast methods Types of Searches • Sequence similarity search with query sequence • Alignment search with profile (scoring matrix with gap penalties) • Serch with position-specific scoring matrix representing ungapped sequence alignment • Iterative alignment search for similar sequences that starts with a query sequence, builds a multiple alignmnet, and then uses the alignment to augment the search • Search query sequence for patterns representative of protein families From Bioinformatics by Mount DNA vs Protein Searches • DNA sequences consists of 4 characters (nucleotides) • Protein sequences consist of 20 characters (amino acids) • Hence, it is easier to detect patterns in protein sequences than DNA sequences • Better to convert DNA sequences to protein sequences for searches. Database Searching Efficacy • To evaluate searching methods, selectivity and sensitivity need to be considered. • Selectivity is the ability of the method not to find members known to be of another group (i.e. false positives). • Sensitivity is the ability of the method to find members of the same protein family as the query sequence. Protein Searches • Easier to identify protein families by sequence similarity rather than structural similarity. (same structure does not mean same sequence) • Use the appropriate gap penalty scorings • Evaluate results for statistical significance. 2 History • Historically dynamic programming was used for database sequence similarity searching. • Computer memory, disk space, and CPU speed were limiting factors. • Speed still a factor due to the larger databases and increase number of searches. • FASTA and BLAST allow fast searching. History • The PAM250 matrix was used for a long time. It corresponds to a period of time where only 20% of the amino acids have remained unchanged. • BLOSUM has replace PAM250 in most applications. BLAST use the BLOSUM62 matrix. FASTA uses the BLOSUM50 matrix. Dynamic Programming • Use Smith-Waterman algorithm or an improvement thereof for local alignment. • Compares individual characters in the full- length sequence • Slower but more sensitive than FASTA or BLAST FASTA • Fast alignment of pairs of protein or DNA sequences • Searches for matching sequence patterns or words called k-tuples corresponding to k consecutive matches in both sequences • Local alignments are build based on these matches. • Better for DNA searches than BLAST (k-tuple can be smaller than minimum of 7 for BLAST) FASTA Algorithm • FASTA uses a search for regions of similarity by hashing • In hashing, a lookup table showing the positions of each k-tuple is made for each sequence • The relative positions of the k-tuple in each sequence are calculated by subtracting the postions of the first characters • K-tuples having the same offset are considered to be aligned. FASTA Algorithm • The number of comparisons increases linearly with the average sequence length • In dynamic programming and dot plots, the number of comparisons increases as the cube or square of the length, respectively.

Documents

questions