Download Introduction to Perl, Bio Perl Basic Perl Programming - Lecture Slides | BCB 444 and more Study notes Bioinformatics in PDF only on Docsity! 1 Introduction to Perl & Bioperl (I) Basic Perl Programming Xuefeng Zhao L. H. Baker Center, ISU BCB 444/544X Objectives • Get an overview of basic Perl programming • Write simple perl scripts to manipulate DNA/RNA sequences Outline • Why Perl? • Install Perl/Bioperl on Windows • Run a Hello perl script • Data Types, Variables and Built-in Functions • Control Structures • Basic IO • Subroutines & Functions • More String Manipulation Why Perl? • Perl: Practical Extraction and Report Language • Easy to learn • Good for string manipulation and File IO • Open source for all OS’s • A lot of Bioinformatics tools available Install Perl/Bioperl Ref: Install Note on www.bioperl.org 1. Unix/Linux: Pre-installed 2. Mac: http://www.macperl.org 3. Windows Download the current ActivePerl from www.activeperl.com. The windows installer package file is ActivePerl-5.8.6.811-MSWin32-x86-122208.msi – Install ActivePerl using the default values by double-clicking the msi file, using c:\perl as the Perl home – *** To install GD.pm. ppm> install http://theoryx5.uwinnipeg.ca/ppms/GD.ppd – To install Bioperl ppm>rep add Bioperl http://bioperl.org/DIST ppm>search bioperl 1. bioperl [1.2.3] bioinformatics tool kits 2. bioperl [1.2.1] non … 8. bioperl-1.4 [1.4] BioPerl 1.4 PPM3 Archive …. ppm>install 8 ……. Successfully installed Bioperl-1.4 version 1.4 in ActivePerl 5.8.6.811 Run a Hello Perl script • Download all Perl scripts for the class from www.bioinformatics.iastate.edu/BBSI/ to your USB disk, and run hello.pl • To run: perl hello.pl or • To run: hello.pl Script:Hello.pl #!/usr/local/bin/perl #hello perl script use strict; print "Hello, ISU-BCBSI\n"; Basic Syntax: 1. the first #! line indicates the location of perl, used for Unix/Linux OS. 2. Free form, case-sensitive 3. Each statement ends with a semicolon (;) 4. A comment line starts with a pound sign (#), you have to comment line by line. 2 Data type, Variables and Built- in functions Data Types: 1. Scalar: string and number. 2. Array: list arrays (array) and associative arrays(hash). Variables: 1. Scalar variables: start with $. Ex. $DNAstr, $numNT 2. Array: start with @. Ex. @nameAA 3. Hash: start with %. Ex. %RestricteEnzym Some Perl built-in functions: length substr index rindex push pop keys sort open close die exit print Scalar variables: number and string Number: $numVar=EXPRESSION; $counter=20; $Tm=37.5; String variables: $strVar=EXPRESSION; #using single quote(‘), no variable expansion takes place, $str1=‘isu-bcbsi’ ; $strSupport=‘$$$:NIH-NSF’; # $strSupport: $$$:NIH-NSF #using double quote(“), a variable expansion takes place, #sepcial characters needs to be escape-ed $str2=“$str1, ames, iowa’ ; # $str2: “isu-bcbsi, ames, iowa”; #using backstick(`), the variable is assigned the return values from the command # line $strDateNow=`date /t`; # $strDateNow: “Mon 06/06/2005” Operators and Functions for Scalar variables Arithmetic Operators: =, +, -, *, /, %, ** Notation: $counter++; $counter= $counter+1; $counter += 1; $counter--; $counter= $counter-1; $counter -= 1; Operator and function for strings $str1: DN $str1: DN $str1=“DNA”; chop($str1); #matching chop, remove “\n”; chomp($str1); chop(STRING) chomp(STRING) $msubstr: GTA$msubstr=substr(“ACGTACGT”, 2, 3); substr(STRING,OFFSET, LEN, REPLACEMENT) $ridx: 5$ridx=index(“ACGTACGT”, “C”); rindex(STRING,SEARCH) $idx: 1$idx=index(“ACGTACGT”, “C”); index(STRING,SEARCH) $len: 4$len=length(“ACGT”)length(STRING) Concatenation: $str3: DNA and RNA Appending: $str1: DNARNA $str1=“DNA”; $str2=“RNA”; $str3=$str1.” and ”.$str2; $str1 .= $str2; Dot (.) .= DescExampleOperator/Function Array Array: @NTlist = (“A”,”T”,”G”,”C”); # assign 4 elements to the array, included in (); $nt_2 = $NTlist[1]; # the array index starts 0, the index number is include in []; $last_idx = $#NTlist; # last_idx: 3 $num_ele = scalar @NTlist; # num_element: 4 @NTlist: G, G, C, Adelete $NTlist[1];delete($arr[$idx]) @NTlist: G, T, G, C, Aunshift((@NTlist, “G”);unshift(@arr, element) @NTlist: T, G, C, Ashift((@NTlist)shift(@arr) $m_nt: T; @NTlist: A, T, G, C, A $m_nt= pop(@NTlist);pop(@arr) @NTlist: A, T, G, C, A, Tpush (@NTlist, (“A”,”T”)):push(@arr, element) DescExampleOperator/Function Hash Hash # indexed by strings. Brace {} for key, percent sign % for entire array # assign 4 elements to the array, %AAlist=(Ala=>”A”,Gly=>”G”,His=>”H”, Phe=>”F”); $aa_gly=$AAlis{Gly}; Add one element$AAlis{Val}=“V”; Remove the pair(Ala, A)delete($AAlist{Ala});delete($ARRAY{KEY}) Return pair by pair (key, value) =>(Ala, A), used for loop each((%AAlist)each(%ARRAY) @aa_vals: A, G, H, F, Not ordered @aa_vals=values (%AAlist);values(%ARRAY) @aa_keys: Ala,Gly,His,Phe Not ordered @aa_keys=keys (%AAlist);keys(%ARRAY) DescexampleOperator/function Control Structures: IF- ELSIF- ELSE If (condition){ statements;…. } elsif { statements;…. } else { statements;…. } If (condition){ statements;…. } elsif { statements;…. } #use elsif only, no switch in Perl, use elsif to get around If (condition){ statements ;…. } #use if only cmp<=>Comparison returns -1,0,1 le!=Not equal le<=Less or equal lt<Less than eq==Equal ge>=Great than or equal gt>Great than stringnumbercomparison