Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS174 Problem Set 3: Sequence Alignment in Bioinformatics, Assignments of Environmental Science

Problem set 3 for cs174 bioinformatics course, spring 2009. Students are required to write a python script to find the optimal global alignment between two nucleotide sequences using dynamic programming. The assignment involves reading sequences from a fasta file, computing alignment scores, and determining similarity measures.

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-f0o
koofers-user-f0o 🇺🇸

5

(1)

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download CS174 Problem Set 3: Sequence Alignment in Bioinformatics and more Assignments Environmental Science in PDF only on Docsity! Problem Set 3 “Of Mice and Men – and dogs” CS 174 Bioinformatics, Spring 2009 Due May 28th before class This problem set is about sequence alignment, as discussed in class. Your task is to implement a dynamic programming procedure to find the optimal global alignment be- tween two nucleotide sequences. You will use your code to compare a gene from the human genome to the same gene from the mouse and dog genome and see which is more “similar”. Assignment 1. Create a file ps3.py which will contain your code for this assignment. Add some comment lines at the beginning of the file, indicating “CS174 Problem Set 3” and your name. 2. Write a function alignSequences that takes as input two nucleotide sequences (as strings) and computes an optimal global alignment. Employ the scoring func- tion used in class, which assigns +2 for matching letters and −1 for mismatched letters (or spaces). Your implementation should follow the simple dynamic programming algorithm (i.e., not Hirschberg’s algorithm) and have time and space complexity O(m · n), where the length of the two input sequences is m and n, respectively. The function should output the optimal alignment score as well as the alignment itself. If there are several different alignments giving the same, optimal score, it is sufficient to output just one of them. For instance, for the two sequences CTGAT and GTCGA the output should be similar to the following: Optimal alignment score: 3 CT-GAT GTCGA- (It is probably a good idea to make up a handful of simple examples, solve them on paper, and compare to your code’s output.) 1
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved