Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Manipulating Data with Command Line Utilities - Lecture Notes | ECN 297, Study notes of Economics

Material Type: Notes; Professor: Cottrell; Class: Preparing for Economic Research; Subject: Economics; University: Wake Forest University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-1ny
koofers-user-1ny 🇺🇸

10 documents

1 / 1

Toggle sidebar

Related documents


Partial preview of the text

Download Manipulating Data with Command Line Utilities - Lecture Notes | ECN 297 and more Study notes Economics in PDF only on Docsity! Economics 201 Cottrell Manipulating data with command-line utilities In doing “real world” data analysis, it quite often happens that one can hold of relevant data, but not in the exact format that one needs for processing using a program such as gretl or Excel. In that case we need to edit the data first. There are various options for doing this. You are probably used to editing stuff using a word processor (e.g. MS Word). One first, important point to notice is that in editing raw data, a word processor is generally not appropriate. The data must remain as plain text, and must not assume the format of a Word doc (which includes formatting codes and a bunch of other stuff). The Windows utility known as Wordpad is OK for the purpose, so long as you take care to save your edited work as plain text. Here, though, I will talk about another option that can be very useful when the raw data differ in some systematic way from what you want, i.e. where the editing task is to recognize some pattern in the raw data and change it in some specified way. The option I’m talking about is the use of simple command- line tools, which enable you to modify a text file non-interactively. By “non-interactively” I mean that you don’t have to go through the file searching and replacing; rather you issue a single command that does all the work for you. As a case in point, consider the wage data that we downloaded from the Bureau of Labor Statistics website, bls.gov. The file we obtained held monthly wage data from the 1960s to the present. It was just what we wanted, except that—as I discovered when the dates looked funny in gretl—after the December value (labeled M12) for each year, there was an M13 value, which represented the average value for the year as a whole. We wanted just the monthly data, so the task in this case was to strip out all lines containing M13, something that is not easy to do with standard Search-and-Replace tools. The smart utility for this job is a program called grep. This program scans a file and either § prints only those lines in the file that match a certain pattern; or § if you add the -v (think reVerse) option, prints only those lines that do not match the pattern. The command for the BLS task was then grep -v M13 bls.txt > new.txt Taking this apart, the first bit is grep -v M13, that is, we’re asking grep to give us lines that do not contain M13. The next bit is bls.txt, the name of the original data file we want scanned. The last bit is > new.txt, which is composed of two parts: the > symbol calls for redirection—instead of sending output to the screen, we want it sent to a file—and then we give the name of the file we want created, here new.txt. Putting it all back together, the command is: “Scan bls.txt for lines that do not contain M13 and send the output from this operation to new.txt.” Where to get grep, how to install How to open a console Other similar things: sed.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved