Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Notes for Homework 2 - Bioinformatics Algorithms, Databases, and Tools | CMSC 423, Assignments of Computer Science

Material Type: Assignment; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 02/13/2009

koofers-user-0fm
koofers-user-0fm 🇺🇸

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Notes for Homework 2 - Bioinformatics Algorithms, Databases, and Tools | CMSC 423 and more Assignments Computer Science in PDF only on Docsity! CMSC423 Homework 2 Handed out: 9/16/2008 Due: 9/23/2008 This is one of the few assignments that require you to write code in Perl. Your assignment is to write a Perl script that can help retrieve data from the NCBI Trace Archive using the query_tracedb script provided by NCBI. This script is available on the glue machines in the public/ directory. The script you write must fulfill the following specifications: 1. Must accept two parameters on the command line, using the following invocation my_query_tracedb <organism> <max records> where <organism> is the name of an organism (species code) stored in the Trace Archive, and <max records> indicates the maximum number of records to be retrieved. If more than <max records> are available in the database for this organism, your script will just return the first <max records> from among them. If <max records> is not provided, your script should retrieve all records for the given query. 2. The trace archive sets an upper limit on the number of records you are allowed to retrieve at a time using query_tracedb. Your script should obscure this limit from the user, i.e. your script should retrieve all records requested using as many independent requests to the Trace Archive as necessary. Note: for this assignment please set the upper limit of records/chunk to 5,000. 3. Your program should retrieve just fasta and quality information in .tar.gz format Additional details: 1. A sample query, to get you started, is: SPECIES_CODE = "WOLBACHIA ENDOSYMBIONT OF DROSOPHILA MELANOGASTER" e.g. your script should accept the command my_query_tracedb "WOLBACHIA ENDOSYMBIONT OF DROSOPHILA MELANOGASTER" <nn> where <nn> is the number of records you want to retrieve. 2. It is OK if your script generates multiple output files, corresponding to individual chunks of 5,000 records. 3. Please provide us with a simple README file that indicates how to run your program. Make sure you include a sample invocation that you know for sure will work. (see next page)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved