Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

grep, awk and sed – three VERY useful command-line utilities, Study notes of Printing

In the simplest terms, grep (global regular expression print) will search input files for a search string, and print the lines that match it.

Typology: Study notes

2021/2022

Uploaded on 08/05/2022

char_s67
char_s67 🇱🇺

4.5

(109)

1.9K documents

1 / 9

Toggle sidebar

Partial preview of the text

Download grep, awk and sed – three VERY useful command-line utilities and more Study notes Printing in PDF only on Docsity! grep, awk and sed – three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print In the simplest terms, grep (global regular expression print) will search input files for a search string, and print the lines that match it. Beginning at the first line in the file, grep copies a line into a buffer, compares it against the search string, and if the comparison passes, prints the line to the screen. Grep will repeat this process until the file runs out of lines. Notice that nowhere in this process does grep store lines, change lines, or search only a part of a line. Example data file Please cut & paste the following data and save to a file called 'a_file': boot book booze machine boots bungie bark aardvark broken$tuff robots A Simple Example The simplest possible example of grep is simply: grep "boo" a_file In this example, grep would loop through every line of the file "a_file" and print out every line that contains the word 'boo': boot book booze boots Useful Options This is nice, but if you were working with a large fortran file of something similar, it would probably be much more useful to you if the lines identified which line in the file they were, what way you could track down a particular string more easily, if you needed to open the file in an editor to make some changes. This can be accomplished by adding the -n parameter: grep -n "boo" a_file This yields a much more useful result, which explains which lines matched the search string: 1:boot 2:book 3:booze 5:boots Another interesting switch is -v, which will print the negative result. In other words, grep will print all of the lines that do not match the search string, rather than printing the lines that match it. In the following case, grep will print every line that does not contain the string "boo," and will display the line numbers, as in the last example grep -vn "boo" a_file In this particular case, it will print 4:machine 6:bungie 7:bark 8:aaradvark 9:robots The -c option tells grep to supress the printing of matching lines, and only display the number of lines that match the query. For instance, the following will print the number 4, because there are 4 occurences of "boo" in a_file. grep -c "boo" a_file 4 The -l option prints only the filenames of files in the query that have lines that match the search string. This is useful if you are searching through multiple files for the same string. like so: grep -l "boo" * An option more useful for searching through non-code files is -i, ignore case. This option will treat upper and lower case as equivalent while matching the search string. In the following example, the lines containg "boo" will be printed out, even though the search string is uppercase. grep -i "BOO" a_file The -x option looks for eXact matches only. In other words, the following command will print nothing, because there are no lines that only contain the pattern "boo" grep -x "boo" a_file Finally, -A allows you to specify additional lines of context file, so you get the search string plus a number of additional lines, e.g. grep -A2 “mach” a_file machine boots bungie Regular Expressions A regular expression is a compact way of describing complex patterns in text. With grep, you can use them to search for patterns. Other tools let you use regular expressions (“regexps”) to modify the text in complex ways. The normal strings we have been using so far are in fact just very simple regular expressions. You may also come across them if you use wildcards such as '*' or '?' when listing filenames etc. You may use grep to search using basic regexps such as to search the file for lines ending with the letter e: -rw------- 1 mijp1 mijp1 494666 Oct 21 12:09 uniform_rand_period_4.agr -rw------- 1 mijp1 mijp1 376286 Oct 21 12:05 uniform_rand_period.agr I can see the file size is reported as the 5th column of data. So if I wanted to know the total size of all the files in this directory I could do: [mijp1@monty RandomNumbers]$ ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}' 2668269 Note that 'print sum' prints the value of the variable sum, so if sum=2 then 'print sum' gives the output '2' whereas 'print $sum' will print '1' as the 2nd field contains the value '1'. Hence it would be straightforwards to write an awk command that would calculate the mean and standard deviation of a column of numbers – you accumulate 'sum_x' and 'sum_x2' inside the main part, and then use the standard formulae to calculate mean and standard deviation in the END part. AWK provides support for loops (both 'for' and 'while') and for branching (using 'if'). So if you wanted to trim a file and only operate on every 3rd line for instance, you could do this: [mijp1@monty RandomNumbers]$ ls -l | awk '{for (i=1;i<3;i++) {getline}; print NR,$0}' 3 -rw------- 1 mijp1 mijp1 6948 Oct 22 00:17 random_numbers.f90 6 -rw------- 1 mijp1 mijp1 289936 Oct 21 11:59 uniform_rand_period_1.agr 9 -rw------- 1 mijp1 mijp1 494666 Oct 21 12:09 uniform_rand_period_4.agr 10 -rw------- 1 mijp1 mijp1 376286 Oct 21 12:05 uniform_rand_period.agr where the 'for' loop uses a 'getline' command to move through the file, and only prints out every 3rd line. Note that as the number of lines of the file is 10, which is not divisible by 3, the final command finishes early and so the final 'print $0' command prints line 10, which you can see as we also print out the line number using the NR variable. AWK Pattern Matching AWK is a line-oriented language. The pattern comes first, and then the action. Action statements are enclosed in { and }. Either the pattern may be missing, or the action may be missing, but, of course, not both. If the pattern is missing, the action is executed for every single record of input. A missing action prints the entire record. AWK patterns include regular expressions (uses same syntax as 'grep -E') and combinations using the special symbols '&&' means 'logical AND', '||' means 'logical OR', '!' means 'logical NOT'. You can also do relational patterns, groups of patterns, ranges, etc. AWK control statements include: if (condition) statement [ else statement ] while (condition) statement do statement while (condition) for (expr1; expr2; expr3) statement for (var in array) statement break continue exit [ expression ] AWK input/output statements include: close(file [, how]) Close file, pipe or co-process. getline Set $0 from next input record. getline <file Set $0 from next record of file. getline var Set var from next input record. getline var <file Set var from next record of file. next Stop processing the current input record. The next input record is read and processing starts over with the first pattern in the AWK program. If the end of the input data is reached, the END block(s), if any, are executed. nextfile Stop processing the current input file. If the end of the input data is reached, the END block(s), if any, are executed. print Prints the current record. print expr-list Prints expressions. print expr-list >file Prints expressions on file. printf fmt, expr-list Format and print. NB The printf command lets you specify the output format more closely, using a C-like syntax, for example, you can specify an integer of given width, or a floating point number or a string, etc. AWK numeric functions include: atan2(y, x) Returns the arctangent of y/x in radians. cos(expr) Returns the cosine of expr, which is in radians. exp(expr) The exponential function. int(expr) Truncates to integer. log(expr) The natural logarithm function. Rand() Returns a random number N, between 0 and 1, such that 0 <= N < 1. sin(expr) Returns the sine of expr, which is in radians. sqrt(expr) The square root function. srand([expr]) Uses expr as a new seed for the random number generator. If no expr is provided, the time of day is used. AWK string functions include: gsub(r, s [, t]) For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. index(s, t) Returns the index of the string t in the string s, or 0 if t is not present. length([s]) Returns the length of the string s, or the length of $0 if s is not supplied. match(s, r [, a]) Returns the position in s where the regular expression r occurs, or 0 if r is not present. split(s, a [, r]) Splits the string s into the array a using the regular expression r, and returns the number of fields. If r is omitted, FS is used instead. sprintf(fmt, expr-list) Prints expr-list according to fmt, and returns the resulting string. strtonum(str) Examines str, and returns its numeric value. sub(r, s [, t]) Just like gsub(), but only the first matching substring is replaced. substr(s, i [, n]) Returns the at most n-character substring of s starting at i. If n is omitted, the rest of s is used. tolower(str) Returns a copy of the string str, with all the upper-case characters in str translated to their corresponding lower-case counterparts. Non-alphabetic characters are left unchanged. toupper(str) Returns a copy of the string str, with all the lower-case characters in str translated to their corresponding upper-case counterparts. Non-alphabetic characters are left unchanged. AWK command-line and usage You can pass variables into an awk program using the '-v' flag as many times as necessary, e.g. awk -v skip=3 '{for (i=1;i<skip;i++) {getline}; print $0}' a_file You can also write an awk program using an editor, and then save it as a special scripting file, e.g. [mijp1@monty Comp_Lab]$ cat awk_strip #!/usr/bin/awk -f #only print out every 3rd line of input file BEGIN {skip=3} {for (i=1;i<skip;i++) {getline}; print $0} which can then be used as a new additional command [mijp1@monty Comp_Lab]$ chmod u+x awk_strip [mijp1@monty Comp_Lab]$ ./awk_strip my_file.dat
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved