Download Getting Data Into SAS: Reading and Importing Data in STAT 517 - Prof. D. Hitchcock and more Study notes Statistics in PDF only on Docsity! STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock Chapter 2: Getting Data Into SAS Data stored in many different forms/formats. Four categories of ways to read in data. 1. Entering data directly through keyboard 2. Creating SAS data sets from raw data files 3. Converting other software’s data files (e.g., Excel) into SAS data sets 4. Reading other software’s data files directly (often need extra SAS/ACCESS prod- ucts) University of South Carolina Page 1 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock 1. Entering Data with Viewtable Window “Tools” “Table Editor” Enter Data Right click on Column Header for Column Attributes “File” “Save As” to save data “Tools” “Table Editor” to Browse/Edit data PROC PRINT to see data University of South Carolina Page 2 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock Data Separated by Spaces Use INPUT statement to name variables. Include a $ after names of character variables. Data Arranged in Columns This is when each value of a variable is found at the same spot on the data line. Can read character or standard numeric data this way. Advantages: 1. Don’t need space between values 2. Missing values don’t need special symbol (can be blank) 3. Character data can have blanks 4. Can skip variables you don’t need to read into SAS Example: INPUT var1 1-10 var2 11-15 var3 $ 16-30; University of South Carolina Page 5 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock Data Not in Standard Format Types of non-standard data: 1. Numbers with commas 2. Numbers with dollar signs 3. Hexidecimal data 4. Dates (in various formats) 5. Times of Day We can read nonstandard data using codes known as informats. Informats come in 3 categories: character, numeric, date. p. 44-45 lists many SAS informats. University of South Carolina Page 6 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock The period must be included!! Otherwise SAS may interpret this as a variable name. If several consecutive variables are of the same type, put names of variables in parentheses, and only enter informat (in separate parentheses) once. Example: (Score1 Score2 Score3 Score4 Score5) (4.1) University of South Carolina Page 7 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock Options for the INFILE statements (FIRSTOBS = ) FIRSTOBS = 5 tells SAS to begin reading data at the fifth line (useful when data file has header info) (OBS = ) OBS = 100 tells SAS to stop reading data after the 100th line (not necessarily after 100 observations!) MISSOVER If data line ends and there are still more variables in the INPUT statement, tells SAS to fill in rest of variables as having missing values for that observation. TRUNCOVER Tells SAS to read data for a variable only until the end of the line, even if the variable’s field extends past end of the data line. University of South Carolina Page 10 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock Reading Delimited Files DLM= allows you to have something other than spaces separated data values. Comma delimiters: DLM=’,’ Tab delimiters: DLM=’09’X # delimiters: DLM=’#’ This assumes two delimiters in a row is the same as a single delimiter. What if two commas in a row indicate a missing value? What if some data values contain commas? Can use DSD option Note: Data values with commas in them must be in quotes Default with DSD is comma delimiters, but can specify other delimiters with DLM= option. University of South Carolina Page 11 STAT 517: Delwiche/Slaughter Chapter 2 Hitchcock SAS data sets: Temporary and Permanent Data sets stored in Work library are temporary (removed upon exiting SAS) Data sets stored in other libraries are permanent (will be saved upon exiting SAS) You can specify the library when creating a data set in the DATA step: Example: Suppose you have a library called sportlib (this is a “libref”). DATA sportlib.baseball creates a data set “baseball” to be stored in the “sportlib” library (permanent). DATA work.baseball would store “baseball” in the “work” library (temporary). DATA baseball by default, stores “baseball” in the “work” library (temporary). University of South Carolina Page 12