Download Writing Bioinformatics Software Biological Databases - Lecture Notes | CMSC 423 and more Study notes Computer Science in PDF only on Docsity! CMSC423: Bioinformatic Algorithms, Databases and Tools Lecture 4 Writing bioinformatics software Biological databases CMSC423 Fall 2008 2 Writing bioinformatics software CMSC423 Fall 2008 5 Bio::Perl • Homework question #5 use Bio:Perl; while ($seq = read_sequence(“test.fa”, 'fasta')) { if ($seq ->length() > 500) { print $seq->primary_id(), “\n”; } } Note: you still need to write your own version... CMSC423 Fall 2008 6 Bio::Perl • Other useful stuff $seqio = new Bio::SeqIO(-format => 'largefasta', -file => 't/data/genomic-seq.fasta'); $pseq = $seqio->next_seq(); $gb = new Bio::DB::GenBank; $seq1 = $gb->get_Seq_by_id('MUSIGHBA1'); etc... CMSC423 Fall 2008 7 BioJava • http://www.biojava.org import org.biojava.bio.*; String filename = args[0]; BufferedInputStream is = new BufferedInputStream(new FileInputStream(filename)); //get the appropriate Alphabet Alphabet alpha = AlphabetManager.alphabetForName(args[1]); //get a SequenceDB of all sequences in the file SequenceDB db = SeqIOTools.readFasta(is, alpha); CMSC423 Fall 2008 10 BioPython • http://www.biopython.org from Bio import SeqIO handle = open(“file.fasta”) seq_record = SeqIO.parse(handle, “fasta”) SeqIO.write(my_records, handle2, "fasta") CMSC423 Fall 2008 11 BioPython • Question 5 from Bio import SeqIO handle = open("test.fasta") for seq_record in SeqIO.parse(handle, "fasta") : if len(seq_record) > 500 : print seq_record.id handle.close() CMSC423 Fall 2008 12 BioPython...more • Same as Bio::Perl: – can directly connect to databases – various sequence manipulations (reverse complement, translate, etc.) – basic bioinformatics algorithms – etc. CMSC423 Fall 2008 15 BioRuby...more • Same as Bio::Perl: – can directly connect to databases – various sequence manipulations (reverse complement, translate, etc.) – basic bioinformatics algorithms – etc. CMSC423 Fall 2008 16 SeqAn • http://www.seqan.de #include <seqan/sequence.h> #include <seqan/file.h> using namespace seqan; using namespace std; String <Dna> seq; String<char> name; fstream f; f.open(“test.fasta”); readMeta(f, name, Fasta()); readMeta(f, seq, Fasta()); CMSC423 Fall 2008 17 SeqAn • Question 5 String <Dna> seq; String<char> name; fstream f; f.open(“test.fasta”); while (! f.eof()){ readMeta(f, name, Fasta()); readMeta(f, seq, Fasta()); if (length(seq)){ cout << name << endl; } } CMSC423 Fall 2008 20 R/BioConductor • Book has lots of examples • Worth learning more about it – easy to do various cool things • example... if time CMSC423 Fall 2008 21 Chado • http://www.gmod.org • Relational schema for storing biological data types in a relational database (e.g. MySQL, Oracle, Sybase, ...) SELECT o.organism_id,o.abbreviation,o.genus,o.species, o.common_name, count(f.feature_id) as n_features, o.comment FROM organism o LEFT JOIN feature f USING (organism_id) GROUP by o.organism_id,o.abbreviation,o.genus,o.species, o.common_name,o.comment ORDER BY o.genus,o.species
1 organism evterm
1 type_id ' '
' | 1
A organism_id i
feature_relationship Meee ate fi ype_id wall
feature_relationship_id
subject_id :
object_id '
type_id
rank A A,
feature
feature_id featursloc
name feature_id featureloc_id
subject_Id | uniquename feature_id
type_id srcfeature_id
organism_id {min
Be Ss residues i fmax
object_id sealen srcfeature_id strand
mdSchecksum rank
locgroup
feature_id
-— feature_id L
feature_cvterm
— featurepro;
feature_cvierm_id vepror
feature_id fealureprop_id
evterm_id feature_id
is_not type_id
value
¥
1
mses
‘of id
evterm_id i vs
OS evterm
CMSC423 Fall 2008
22