Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Perl Primer - Computer Structural Bioinformatics - Lecture Notes | ECS 129, Exams of Computer Science

Material Type: Exam; Professor: Koehl; Class: Comp Structural Bioinfo; Subject: Engineering Computer Science; University: University of California - Davis; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 09/17/2009

koofers-user-p9f
koofers-user-p9f 🇺🇸

10 documents

1 / 44

Toggle sidebar

Related documents


Partial preview of the text

Download Perl Primer - Computer Structural Bioinformatics - Lecture Notes | ECS 129 and more Exams Computer Science in PDF only on Docsity! 1 Perl Primer Patrice Koehl Department of Computer Sciences, University of California, Davis. Acknowledgments: This primer is mostly a compilation of information found in three books that I highly recommend: - “Beginning Perl” by Simon Cozens, which can be found online at http://www.perl.org/books/beginning-perl/. This is a free book, and a great introduction to Perl programming. - “Perl programming for Biologists”, by D. Curtis Jamison (Wiley-Liss publisher). This is a small book with many examples/problem sets related to biology. I like the way it is structured. - “Beginning Perl for Bioinformatics”, by James Tisdall (O’Reilly publisher) Introduction 1. Why Perl? “Perl” is an interpreted computer language. The name “Perl” is not really an acronym, but it is usually believed to mean “Practical Extraction and Report Language”. More importantly, Perl is a language for doing what you want to do. It is important to understand that there is always more than one way to solve a problem. In programming, Perl focuses on getting the job done. One Perl program may be faster than another, or more concise, or easier to understand, but it both do the same thing, there won’t be a judgment that defines which one is “better”. This also means that you do not need to know every detail about the language to do what you want with it. Perl has strength that makes it an ideal language to learn and use: - It is completely free, and available on all operating system - It is very easy to learn - Perl was designed to be easy for humans to write, rather than easy for computer to understand. Perl syntax is more like English than other programming languages - Perl is very portable. Perl programs can be run on any computers, as long as Perl is installed on it. - Perl “talks” texts. It works with words and sentences, instead of characters. Files are series of lines, instead of individual bytes. - Perl is a “high-level” language: you do not have to worry about the computer’s operation (such as allocation and de-allocation of memory, …). 2 It is only fair to mention that these strengths can also translate into weaknesses: - Perl takes care for you of all “low-level” operations: this may not always lead to efficient code - Perl is interpreted, and loses the efficiency of compiled languages. - Perl users then to write programs for small, specific jobs. These programs are usually for the programmer’s eye only, and as such are often incomprehensible to everyone but the original programmer. In that respect, I can only emphasize the need for clarity, as well as for useful comments in your source files! - Perl was designed to be easy for humans. As a consequence, it is relatively lenient on the style you use. This can lead to bad programming habits. As an analogy, think of what would happen to your English writing style if nobody had ever cared about how you write as long as they understand what you have written. To avoid this, the key is to develop first a method to solve your problem that is independent of Perl or any other language, and then to adapt this method to Perl. 2. What is Perl used for? Perl is widely used for extracting data from one source and translating it to another format. This includes manipulating databases, formatting and reformatting text files, search-and-replace operations, organizing and manipulating experimental data. Perl is used to manage the data from the Human Genome Project. Perl is a very useful programming language for system administrators, as it allows then to automate administration tasks, tidying up the systems, process logs, produce reports on system usage and watch for security usage. The most popular use of Perl however is for CGI programming, that is for dynamically generating web pages. 3. How do I get Perl? Perl has been ported to many platform, and will certainly run on the standard operating systems such as UNIX, Linux, Solaris, FreeBSD, all flavors of Windows, and Apple MacOS. Where to get Perl: - You can get the source to the latest stable release of Perl from http://www.perl.com/CPAN-local/src/README.html - Binary distributions for some ports are available at http://www.perl.com/CPAN- local/ports/index.html - You can get binary packages of Perl for Linux, Solaris, Mac OS and Windows from ActiveState at http://www.activestate.com/ActivePerl (free for download) 5 ! perl -e 'print "Hello World!\n";' - you create a file that contains all your statements, and execute it: ! perl hello.plx The file hello.plx contains: The different elements of a Perl script: - The first line (not required) : #! /usr/bin/perl indicates that the file contains a source code that should be executed by Perl. This is not a Perl command per se: it only signals to the shell how it should interpret the file. - Directives: the statement use warnings tells the interpreter to output “meaningful” warnings if there are syntax errors in the file - Documenting the program: any line (except the first) starting with a sharp (#) is treated as a command line and ignored. This allows you to provide comments on what your program is doing: this is extremely useful, so use it! More generally, a line in a Perl script may contain some Perl code, and be followed by a comment. This means that we can document the program “inline”. - Keywords: Instructions that Perl recognizes and understands. The word print in the program above is one example. There are two types of keywords: • functions (such as the print keyword); these are the verbs of the programming language and they tell perl what to do. • Control keywords, such as if and else. It is a good idea to respect keywords, and not use them as names in your programs, even though Perl tolerates this. - Statements: Statements are the sentences of the program. Instead of a full stop, a statement in Perl usually ends with a semicolon, as in the program above. There are times when you can get away without using the semicolon (when it is absolutely clear to Perl that the statement is finished, such as for example the last statement in a program), but I strongly advise against this. It is good practice to always end a statement with a semicolon. Statements can be grouped together into a block (just like a paragraph in a text), by surrounding them with braces: {…}: #! /usr/bin/perl use warnings; # Set warning flags to detect errors # print “Hello World!\n”; 6 - Escape sequences: Perl provides a mechanism called “escape sequences” to output special characters/actions: the sequence \n in the program above tells Perl to start a new line. Here is a list of the more common escape sequences (also called “metacharacters”): Escape Sequence Meaning \t Tab \n Start a new line \r Carriage return \s White space (includes \t, \n and \r) \u Uppercase for next character \l Lowercase for next character \U Uppercase until \E \L Lowercase until \E \E End case modification \b Back up one character (‘backspace’) \a Alarm (rings the system bell) - White space: White space is the name given to tabs, spaces, and new lines. Perl is very flexible about where you put white space in your program. For example, we have seen that we can use indentation to help show the block structure. You do not need however t use any white space at all, if you do not want to. You could rewrite the program above as: I think it is a bad idea: white space is a nice tool to make programs more understandable. Simple exercises: 1) Write a program printline.plx, that prints the sentence “This is my second program” : a. As a single line b. With a single word on each line. 2) Find an online manual for Perl 3) Which of the following statements are likely to cause problems: a. print “This is a valid statement\n”; b. print “This is a valid statement”\n; c. print “This is a valid statement” d. printx “This is a valid statement\n”; use warnings; print “Hello World!\n”; { print “This is”; print “a block of simple”; print “statements”; } 7 Chapter 1: Scalar Variables and Data Types 1. Perl Variables A variable is a name reference to a memory location. Variables provide an easy handle to keep track of data stored in memory. Most often, we do not know the exact value of what is in a particular memory location, rather we know the type of data that is stored there. Perl has three main types of variables: - Scalar variables hold the basic building blocks of data: numbers, and characters. - Array variables hold lists referenced by numbers (indices) - Hash variables hold lists references by labels. The three types are distinguished by the first character in the variable name: ‘$’, ‘@’, and ‘%’ for scalar, array and hash variables, respectively. Following the type symbol, the name can be practically any combination of characters and of arbitrary length. Creating a variable is as simple as making up a variable name and assigning a value to it. There are a few rules regarding variable names that you need to be aware of: - The second character of a name should be either a letter (A to Z or a to z), a digit (0 to 9) or an underscore (_). Most character names that consist of a single character have a predefined significance, and should therefore be avoided. - If you have a digit in the second position, the name can only contain more digits: $100 is valid, but $10a is not. - Spaces are one type of characters that are not allowed: use underscore instead. - Variables are case sensitive: this means that $dna refers to a different location in memory than $DNA. Assigning a value to a variable is easy: all you have to do is write an equation, with the variable name on the left, an = sign, and the value on the left. The = sign is called the assignment operator. 2. Special variables Perl has many predefined special variables that contains default values designed to make life easier for programmers. Since Perl variables can be reassigned, there is a danger that you modify one of these variables accidentally. These variables are usually a combination of obscure characters, and if you give meaningful names to your variables, you should not have this problem. Here is a list of the most common ones: 10 The autoincrement and autodecrement operators: There are two more operators on number that are useful, ++ and --. They add and substract one from the variable, but their precedence is a little strange, and depends on where they are placed with respect to the variable. Try and run the following code: You should see the following output: ! perl hello.plx The original variables are: a = 10 and b = 6 After incrementing : a = 11 and b = 10 Now we have : a = 12 and b = 24 Finally, we have: a = 27 and b = 23 How does it work? Let us go through the program step by step: - First, we set up the variables $a and $b: $a = 10; $b = 6; print “The original variables are: a = ”,$a, “ and b = ”, $b,”\n”; - In the following line, the assignment happens before the increment. $b is set to the current value of $a, 10, and then $a is autoincremented, becoming 11: $b = $a++; print “After incrementing : a = ”,$a, “ and b = ”, $b,”\n”; #! /usr/bin/perl # Program: testop.plx # use warnings; $a = 10; $b = 6; print “The original variables are: a = ”,$a, “ and b = ”, $b,”\n”; $b = $a++; print “After incrementing : a = ”,$a, “ and b = ”, $b,”\n”; $b = ++$a*2; print “Now we have : a = ”,$a, “ and b = ”, $b,”\n”; $a=--$b+4; print “Finally, we have: a = ”,$a, “ and b = ”, $b,”\n”; 11 - In the next line, the incrementing takes place first. $a is now 12, and $b is set to 2* this, i.e. 24 $b = ++$a*2; print “Now we have : a = ”,$a, “ and b = ”, $b,”\n”; - Finally, $b is decremented first, and becomes 23. $a is set to $b plus 4, which is 27. $a=--$b+4; print “Finally, we have: a = ”,$a, “ and b = ”, $b,”\n”; 3.2 Strings A string is a group of characters attached together, enclosed by quotation marks. For now, we will only consider double quotes. Just like with numbers, many operations can be performed on strings: the most common ones are listed in the table below. String operator Meaning . (dot) Concatenation reverse Reverse the characters in the string. =~ Binding m/Pattern/ Matching operator s/Pattern1/Pattern2/ Substitution operator tr/Pattern1/Pattern2/ Translate operator Concatenating strings: The dot operator, when placed between two strings, creates a single string that concatenates the two original strings. In the following program: The variable $DNA contains the string “ATTGCGGCCT”. Note that the concatenation operator can be attached to an assignment: $DNA.=”G”; would add a “G” at the end of the string contained in $DNA. #! /usr/bin/perl use warnings $DNA1=”ATTGC”; $DNA2=”GGCCT”; $DNA=$DNA1.$DNA2; 12 Reversing a string Though not strictly an operator, the function reverse is especially useful as it allows you to reverse the characters in a string. The program: Would produce $DNARev = “GTAGTA”. The binding operator: =~ The binding operator means “apply the operation on the right to the string contained in the variable on the left. It is mostly used with the regular expression operator listed below (see figure). Regular expression operators: These operators perform pattern matching on a string using regular expressions, and can be used to search, substitute, and transform strings of any length. There are only three regular expression operators. There are: s/PATTERN1/PATTERN2/[g][i][e][o] (substitution) tr/PATTERNLIST1/PATTERNLIST2/[c][d] (translate) m/PATTERN/[g][i][o] (match) The s operator is the substitute operator: it finds the first PATTERN1, and replaces it with the second PATTERN2. The tr operator is the translate operator, taking a PATTERN from the first PATTERNLIST1 and replacing it with the corresponding PATTERN from the second PATTERNLIST2. The m operator is the match operator, looking for the PATTERN in the string. The behavior of the operator can be modified by one or more optional switches: [c]omplement [d]elete [e]valuate [g]lobal [i]nsensitive to case [o]nly evaluate once In practice, the g and i switches are the most useful. $DNA=”ATGATG”; $DNARev= reverse $DNA; 15 c) Match The match operator searches for a given PATTERN into a string. It returns a 1 if it finds it, and 0 otherwise. Later we will see how this operator can be used to count the number of instances of the PATTERN, as well as their positions. 4. Interpolation and Escapes When working with strings, the type of quotation mark around the string makes a difference as to how Perl treats it. A string enclosed in double quotes undergoes a process called interpolation, and anything that Perl recognizes as a variable get replaced by the value of the variable. A string enclosed in single quote is not interpolated, and any character in it is used exactly as is. Let us see an example. The program below: Would produce the following output: My name is Patrice My name is $name ! i.e. the second print would not interpolate the variable, nor the escape character \n. One obvious difficulty with variable interpolation is how to embed special characters into an output. For example, we might want to exactly produce the line: Today’s price for the book “cannery row” is $7.99. If we naively write: we will get an error message: Unmatched ‘. To deal with this, we can “hide” a character from Perl using the backslash character. The proper syntax is: #!/usr/bin/perl use warnings; $name=”Patrice”; print “My name is $name\n”; print ‘My name is $name\n’; print “Today’s price for the book “cannery row” is $7.99.\n”; Print “Today\’s price for the book \”cannery row\” is \$7.99.\n”. 16 Note that we need to put a backslash in front of the dollar sign, otherwise Perl will interpret $7 as a variable. Finally, if the string we want to print contains a backslash (/), we need to hide it from Perl by putting a backslash in front! To print \C, we would use: 5. Introducing STDIN Often when we write a Perl script, we need to be able to ask the user for additional data when (s)he runs the program. The way to do this is with the construct <STDIN>. I will explain this in more details when we look at files. Basically, <STDIN> reads a line from the file called standard input. Usually, the standard input is not really a file, but the user’s keyboard. Similarly, the print function writes by default to the file called standard output, which is usually the user’s screen. In order the user for a line of text (say a DNA sequence), we write something like: Note that the DNA sequence read in will have one extra character, corresponding th the carriage return. To remove it, we can use the command: print “\\C\n”; print “Enter your DNA sequence : “; $DNA = <STDIN>; chomp($DNA); 17 Exercises: 1. Without the aid of a computer, work out the order in which each of the following expressions would be computed and their value. i. 2 + 6/4-3*5+1 ii. 17 + -3**3/2 iii. 26+3**4*2 iv. 2*2**2+2 Verify your answer by writing a small Perl program that implements these expressions. 2. Without the aid of a computer, work out these successive expressions and give the values of $a, $b, $c and $d upon completion. Then check your answer using a Perl script: 3. Write a Perl program that: i. Reads a DNA sequence from standard input ii. Writes it on standard output all in lower case iii. Writes in on standard output with the purines in upper case and the pyrimidines in lower case iv. Writes the sequence in reverse order 4. Write a Perl program that: i. Reads a DNA sequence from standard input ii. Counts the number of purines and pyrimidines iii. Counts the number of A, T, G and C. 5. Write a Perl program that: i. Reads a DNA sequence from standard input (5’ !3’) ii. Writes the corresponding RNA sequence iii. Writes the sequence of the complementary strand (5’!3’) 6. Write a Perl program that: i. Reads a DNA sequence from standard input (5’ !3’) ii. Remove all Guanines iii. Replaces all Thymines with Uracil iv. Count number of remaining nucleotides v. Writes the resulting sequence on standard output $a=4; $b=9; $c=5; $d=++$a*2+$b++*3; $c+=--$d/3; $b%=$a; $a=--$b; 20 not complain about the fact that we forgot to put the elements of the list between parentheses. The correct syntax for this command should have been: print ( “Finally, we have: a = ”,$a, “ and b = ”, $b,”\n”); Let us look now at a list of numbers: (1,4,6). If we print it using: We would get: 146! Perl does not automatically put spaces between lists of elements for us, nor does it put a new line on the end. If we want separators when we print them, we need to put them ourselves. Note that lists in Perl can be mixed: you can include strings, numbers, scalar variables and even lists in a single list! 2.2 Accessing list values We have seen how to build a list. Another thing that is useful is to be able to access a specific element or set of elements from a list. The way to do this is to place the number of the element we want is square brackets after the list. Let us look at an example: Which day do you think will be printed? If you try this program, you will get: ! Wednesday Why didn’t we get “Tuesday”, the second element of the list? This is because Perl starts counting from 0 and not 1!! This is something important to remember. You should also notice that we use two sets of parentheses after the command “print”: the inner parentheses define the list of days of the week. When we choose element 2 of this list, we have generated an element, which we pair with “\n” to form a second list. The element you want does not have to be literal: it can be a variable as well. As an exercise, write a small program that reads in a number between 1 and 7, and outputs the corresponding day of the week. The answer is on the next page. print (1,4,6); use warnings; # # print element 2 of the list of days of the week: # print ((‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’)[2],”\n”); 21 The examples above show you how to single out one element of a list; if you wanted more than one element, we give their respective positions within the brackets, separated with brackets. For example, if we wanted to print out the first and last day of the week, we would write: Notice again that the first and last days correspond to position 0 and 6, and not 1 and 7. 2.3 Special lists: ranges Often the lists we use have a simple structure: the numbers from 1 to 10, or the letters from a to z. We do not need to write these lists explicitly: Perl has the option to specify a range of numbers or letter. The two examples cited would be written in Perl as: (1..10) and (‘a’..’z’) This will give us lists of 10 and 26 elements, respectively. Note that the notation is inclusive, i.e. the elements written are part of the list. 3. Arrays There is not much we can do with lists, except print them. Even when you print them, the statements can become cumbersome. Also, there is no way to manipulate directly a list: if we wanted to create a new list from an existing list by removing its last element, we could not. The solution offered by Perl is to store lists into arrays. 3.1 Assigning arrays Remember that names for arrays start with the character @, and follow the same rules as those defined for scalar variables. We store a list into an array the same way we store a scalar into a scalar variable, by assigning it with =: use warnings; # Read in day considered print “Enter a number from 1 to 7 : “; $day=<STDIN>; $day--; # print element $day of the list of days of the week: # print ((‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’)[$day],”\n”); print ((‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’)[0,6],”\n”); 22 @days=(‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’); @numbers=(1..6); The first array @days contains the names of the days of the week, while the second array @numbers contains the integers from 1 to 6. Once we have assigned a list to an array, we can use it where we would use a list. For example, will print: !MondayTuesdayWednesdayThursdayFridaySaturdaySunday Again, Perl does not add space between the elements of the list when it prints the. You have to deal with it yourself! (Exercise: how would you change the definition of @days such that the command print @days; would print one element per line? We will see how to do this in section 3.6) 3.2 Accessing one element in a list We can access one element in a list by using the index of the drawers it has been assigned to. Remember how we access an element in a list: (‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’)[0] Figure 2.1: Scalar variables and arrays. A scalar variable is like a single box, while an array behaves like a chest of drawers. Each of the drawers is assigned a number, or index, which starts at 0. print @days, “\n”; 25 splice(ARRAY, OFFSET, LENGTH, LIST) The splice function takes an ARRAY, moves to the OFFSET position, removes LENGTH entries, and replace them with the entries from the LIST. The LENGTH and LIST are optional parameters, and the presence or absence of these two parameters determine the behavior of splice. Let us look at the behavior of the full function. After these lines of Perl: The array @days contains: (‘Monday’, ’Tuesday’, ’January’, ’February’, ’March’, ’Friday’, ’Saturday’, ’Sunday’), i.e. it removed the 2 (LENGTH=2) elements ‘Wednesday’ and ‘Thursday’ that are at positions OFFSET (2) and OFFSET+1, and replaced them with (’January’, ’February’, ’March’). Note that the elements that are replaced are not discarded, they are returned as the results of the function: In this case, @array contains (‘Wednesday’,’Thursday’). If we leave out the replacement LIST, splice will simply delete the LENGTH entries starting from OFFSET. If we leave out LENGTH (or equivalently set it to 0), splice will just insert LIST starting at position OFFSET: The array @days contains now: (‘Monday’, ’Tuesday’, ’January’, ’February’, ’March’, ‘Wednesday’,’Thursday’,’Friday’, ’Saturday’, ’Sunday’). Finally, if we leave out both the LENGTH and LIST, splice truncates the ARRAY and only leaves position 0 to OFFSET-1. @days=(‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’); # splice(@days,2,2,’January’,’February’,’March’); @array= splice(@days,2,2,’January’,’February’,’March’); @days=(‘Monday’,’Tuesday’,’Wednesday’,’Thursday’,’Friday’,’Saturday’,’Sunday’); # splice(@days,2,0,’January’,’February’,’March’); 26 3.5 Other useful array functions Reverse: reverse takes an array and returns an array in reverse order: The array @revarray contains the list (4,3,2,1). Sort: sort takes an array and sorts it in increasing order if it contains numbers, or alphabetical order if it contains strings. Counting the number of elements in an array There are two ways to do this: - when you assign a array to a scalar, Perl assumes you want to find the number of elements in the array: $n=@days puts in n the number of elements in @days (7) - We know that the first index in an array is 0 and it would be useful to know the last index in an array: Perl provides this information with the special prefix $#: $iend=$#days sets $iend to the last index of @days, i.e. 6. The number of elements in @days is therefore $iend+1. 3.6 From arrays to string and back. join: from array to string: Remember that the command: print @array, “\n”; prints the elements of the array on a single line, without separator. This is usually not what we want, as we would prefer separators between the elements. One way to get that is to rely on Perl to interpolate the array, by putting it between quotes: print “@array \n”; will print the elements of the array, separated with a space. To get even more control on the output, and choose the separator string, we can use the join function: @array=(1,2,3,4); @revarray=reverse(@array); 27 the join operator : join(STRING, ARRAY) concatenates the elements of ARRAY into a string, with STRING in between any two elements. For example, print join(‘,’,@array),”\n”; will print the elements of the array, separated with a comma. Split: from string to array The reverse operation to join is the split operator: split(/PATTERN/,STRING) separates the blocks of characters of STRING and puts them into an ARRAY, looking for PATTERN as delimiters between the blocks. The PATTERN is given between slashes (/ /). We can use split to separate a word into characters, a sentence into word and a paragraph into sentences: In the first case the null string is matched between each character, and that is why the @chars array is an array of characters - ie an array of strings of length 1. 4. Hashes 4.1 Definition Figure 2.2: The hash variable A hash is a special array for which each element is indexed by a string instead of an integer. The string is referred to as a key, while the corresponding item is the value. Each element of a hash is therefore a pair (key,value). The name of a hash table starts with the symbol %. @chars = split(//, $word); @words = split(/ /, $sentence); @sentences = split(/\./, $paragraph); 30 Exercises: 1. Write a program that prints all the numbers from 1 to 100. Your program should have much fewer than 100 lines of code! 2. Starting with the gene $GENE1=”ATGTTGATGTG”, write a Perl program that creates the new genes $GENE2, $GENE3, $GENE4 and $GENE5 such that: i. $GENE2 only contains the last two nucleotides of $GENE1 ii. $GENE3 only contains the first two nucleotides of $GENE1 iii. $GENE4 only contains nucleotides 2,4,6,8 and 10 of $GENE1 iv. $GENE5 only contains the first 3 and last 3 nucleotides of $GENE1 3. Suppose you have a Perl program that read in a whole page from a book into an array @PAGE, with each item of the array corresponding to a line. Add code to this program to create a new array @SENTENCES that contains the same text, but now with each element of @SENTENCES being one sentence. 4. Write a Perl program that: i. Reads a DNA sequence from standard input ii. Converts it to RNA iii. Add the codon AAU in the middle of the RNA sequence 5. Write a hash that contains the genetic code. 6. Write a Perl program that switches two bases in a DNA string at specified positions. 31 Chapter 3: Control Structures 1. Higher order organization of Perl instructions In the previous chapters, we have introduced the different types of variables known by Perl, as well as the operators that manipulate these variables. The programs we have studied so far have all been sequential, with each line corresponding to one instruction: this is definitely not optimal. For example, we have introduced in the previous chapter the concept of lists and arrays, to avoid having to use many scalar variables to store data (remember that if we were to store the whole human genome, we would need either 30,000 scalar variables, one for each gene, or a single array, whose items are the individual genes); if we wanted to perform the same operation on each of these genes, we would still have to write one line for each gene. In addition, the programs we have written so far would attempt to perform all their instructions, once given the input. Again, this is not always desired: we may want to perform some instructions only if a certain condition is satisfied. Again, Perl has thought about these issues, and offers solutions in the form of control structures: the if structure that allows to control if a block of instruction need to be executed, and the for structure (and equivalent), that repeats a set of instructions for a preset number of times. In this chapter, we will look in details on the syntax and usage of these two structures. Figure 3.1: The three main types of flow in a computer program: sequential, in which instructions are executed successively, conditional, in which the blocks “instructions 1” and “instructions 2” are executed if the Condition is True or False, respectively, and repeating, in which instructions are repeated over a whole list. 32 2. Logical operators Most of the control structure we will see in this chapter test if a condition is true or false. For programmers, “truth” is easier to define in terms of what is not truth! In Perl, there is a short, specific list of false values: • An empty string, “ “, is false • The number zero and the string “0” are both false. • An empty list, (), is false. • The undefined value is false. Everything else is true. 2.1 Define - Undefined Until a specific value has been stored in a variable, the variable has a special value called “undef” (short for undefined). To test if a variable has been defined, we use the function defined(); If needed, you can remove the definition of a variable by using the function undef(); Note that reversely, you define your variables prior to using them, using the function my: Note that it is really good practice to always declare your variables! 2.2 Comparing numbers and strings We can test whether a number is bigger, smaller, or the same as another. Similarly, we can test if a string comes before or after another string, based on the alphabetical order. All the results of these tests are TRUE or FALSE. Table 3.1 lists the common comparison operators available in Perl. $a = 1; defined($a); # returns a TRUE value $b = 0; defined($b); # returns a TRUE value defined($c); # returns a FALSE value, unless $c was defined before. undef($a); defined($a); # returns a FALSE value my $a,$b,$c; 35 Block code 1 is executed if the condition is true, and block code 2 is executed otherwise. Here is an example of a program asking for a password, and comparing it with a pre-stored string: Perl also provides a control structure when there are more than two choices: the elsif structure is a combination of else and if. It is written as: Note that any number of elsif can follow an if. if (condition) { block code 1; } else { block code 2; } $hidden=”Mypasscode”; print “Enter your password :”; $password=<STDIN>; chomp($password); if($password eq $hidden) { print “You entered the right password\n”; } else { die “Wrong password !!\n”; } if (CONDITION1) { block code 1; } elsif (CONDITION2) { block code 2; } else { block code 3; } 36 4. Loops One of the most obvious things to do with an array is to apply a code block to every item in the array: loops allow you to do that. Every loop has three main parts: • an entry condition that starts the loop • the code block that serves as the “body” of the loop • an exit condition Obviously, all three are important. Without the entry condition, the loop won’t be executed; a loop without body won’t do any thing; and finally, without a proper exit condition, the program will never exit the loop (this leads to what is referred to an infinite loop, and often results from a bug in the exit loop). There are two types of loops: determinate and indeterminate. Determinate loops carry their end condition with them from the beginning, and repeat its code block an exact number of tmes. Indeterminate loops rely upon code within the body of the loop to alter the exit condition so the loop can exit. We will see two determinate loop structures, for and foreach, and one indeterminate loop structure, while. 4.1 For loop The most basic type of determinate loop is the for loop. Its basic structure is: When Perl finds a for loop, it proceeds as follows: - it sets up the ENTRY condition - it examines the TEST condition - it executes the code block if TEST is true; otherwise it exits the loop - it execute the MODIFICATION - it goes back to the TEST condition For example, the Perl program: Will print out 0 1 2 3 4 (one number per line). Note that if ENTRY is set such that TEST fails right away (by setting for example $i=10 in the program above), the loop is not executed at all. for (ENTRY; TEST; MODIFICATION) { code block; } for ($i=0; $i < 5; $i++) { print “$i\n”; } 37 The for loop is very useful for iterating over the elements of an array. Let us write for example a program that counts the number of adenine in all genes in the human genome. We will suppose that the sequences of all human genes are stored in the array @human_genome: 4.2 Foreach loop The use of a for loop to iterate over each item in an array is so frequent that Perl has a special structure for this: the foreach loop. The The foreach loop takes each value in @ARRAY, stores it into $VAR, and execute the code blocks. It exits when it has explored all values in @ARRAY. For example, we can rewrite the code for counting the number of adenines in the human genome as: #First, define number of genes in the array @human_genome: # $ngenes=$human_genome; # # Loop over all genes, and store number of adenines in an array, as well as in a cumulative # counter $nA: # $nA_tot=0; for (i=0; i<$ngenes; i++) { $nA_tot +=( $human_genome[i] = ~ tr/A//); } # # Now print the total number of adenines: # print “The total number of adenines in the human genome is : $nA_tot\n”; foreach $VAR in @ARRAY { code block; } $nA_tot=0; foreach $gene (@human_genome) { $nA_tot+=($gene =~ tr/A//); } print “The total number of adenines in the human genome is : $nA_tot\n”; 40 Exercises: 1. Write a program that reads in a integer value n and outputs n! (Reminder: n!=1x2x3x4….xn). 2. Write a program that reads in a DNA sequence, and writes it out with the nucleotide at even positions in uppercase, and the nucleotide at odd positions in lower case. 3. Write a program that reads in 2 DNA sequences, and check if they are complementary of each other. Make sure to check that the sequences given as input are true DNA sequences. 4. Write a Perl program that reads in a DNA sequence, translates it to RNA, and finds all start codons it contains. 5. Write a Perl program that reads in a DNA sequence, translates it to RNA, and finds the position of the gene it contains (we will suppose that the sequence contains only one gene; there might be however several AUG codons, an several stop codons; you will pick the pair Start / Stop that is in phase –i.e. distant by a mutiple of 3), and that gives the longest gene). 6. Write a Perl program that reads in a DNA sequence, translates it to RNA, find the gene it contains, and translates it into a protein sequence. (You will use the programs you have written for exercise 5 and 6). 41 Chapter 4: Input and Output We have started to write real programs now, but we are still limited by how we get information in and out. At the moment, all we can do is ask the user for input using <STDIN> and print data on the screen using print. What we want to do in this chapter is to extend these techniques into reading from and writing to files, as well as into passing parameters to a program. 1. Program Parameters. Passing arguments to a Perl program is easy: you just type the name of your program, followed by the list of arguments you want to pass: ! perl myprogram.pl arg1 arg2 arg3 which gives three arguments into the programs. But how do we access these arguments inside the program? This in fact is easy with Perl: all parameters you give to a program are stored in a special array variable called @ARGV (stands for ARGument Values. The @ARGV can be treated as a normal array: for example, $ARGV[0] and $#ARG is the index of the last parameters passed into the program. This is useful to check that the right number of arguments have been passed: If we tried running myprog with only 2 arguments, the program would die and issue a reminder. 2. Dealing with files. 2.1 Filehandle When we are dealing with files, we need something that tells Perl which file we are talking about, something that will give us a “handle” on the file: this ‘something’ we want is known as a filehandle. In practice, a filehandle is simply a name that has been associated with a particular file using the open() command: open(filehandle, filename) If ( $#ARG !=2) { die “usage: myprog arg1 arg2 arg3”; } 42 The filehandle is just an unquoted string like INPUT or FILE. Although it is not required, most Perl programmers put filehandles in all capital letters to make them readily distinguishable fro the rest of the program code. The filename is a string between quotes containing the name of the file. At the front of the name is a symbol indicating how and why we wish to open the file: - a < symbol opens the file for reading - a > symbol opens the file for writing - a >> symbol opens the file for appending A + symbol in front of the > or < symbol opens the file for both reading and writing. It is important to recognize the difference between the symbols > and >>: if a file opened for writing already exists, the content of that file is wiped out and replaced with the output of the program. Appending, on the other hand, simply adds the program outputs to the end of the file. If the file does not exist, open will create the file in either mode. 2.2 Built-in Filehandles We have actually already seen a filehandle: the STDIN. This is a file handle for the special file ‘standard input’. Standard input is the input provided by a user by typing on the keyboard. As a counterpart to standard input, there is standard output: STDOUT. Every time we have used the function print, we have implicitly used STDOUT: is equivalent to: There is one more built-in filehandle: standard error, or STDERR, which is where we write the error messages when we use die. 2.3 File safety Reading from an undefined filehandle will at best result in an error message. It is usually a good idea to test if a file is really open before reading. The command “open” returns true if it successfully opens a file: this can be used in a test: print “Hello, world. \n”; print STDOUT “Hello, world. \n”. if ( not open(INPUT, “<myfile”)) { die “Cannot open myfile \n”; }
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved