Download SAS-R cheat sheet, compile sas with R and more Cheat Sheet Programming Languages in PDF only on Docsity! SAS <-> R :: CHEAT SHEET Introduction This guide aims to familiarise SAS users with R. R examples make use of tidyverse collection of packages. Install tidyverse: Attach tidyverse packages for use: install.packages("tidyverse") library(tidyverse) R data here in ādata framesā, and occasionally vectors (via c( ) ) Other R structures (lists, matricesā¦) are not explored here. Keyboard shortcuts: <- Alt + - %>% Ctrl + Shift + m Datasets; drop, keep & rename variables data new_data; set old_data; run; new_data <- old_data data new_data (keep=id); set old_data (drop=job_title) ; run; new_data <- old_data %>% select(-job_title) %>% select(id) data new_data (drop= temp: ); set old_data; run; new_data <- old_data %>% select( -starts_with("temp") data new_data; set old_data; rename old_name = new_name; run; new_data <- old_data %>% rename(new_name = old_name) Conditional filtering data new_data; set old_data; if Sex = "M"; run; new_data <- old_data %>% filter(Sex == "M") data new_data; set old_data; if year in (2010,2011,2012); run; new_data <- old_data %>% filter(year %in% c(2010,2011,2012)) data new_data; set old_data; by id ; if first.id ; run; new_data <- old_data %>% group_by( id ) %>% slice(1) data new_data; set old_data; if dob > "25APR1990"d; run; new_data <- old_data %>% filter(dob > as.Date("1990-04-25")) New variables, conditional editing data new_data; set old_data; total_income = wages + benefits ; run; new_data <- old_data %>% mutate(total_income = wages + benefits) data new_data; set old_data; if hours > 30 then full_time = "Y"; else full_time = "N"; run; new_data <- old_data %>% mutate(full_time = if_else(hours > 30 , "Y" , "N")) data new_data; set old_data; if temp > 20 then weather = "Warm"; else if temp > 10 then weather = "Mild"; else weather = "Cold"; run; new_data <- old_data %>% mutate(weather = case_when( temp > 20 ~ "Warm", temp > 10 ~"Mild", TRUE ~ "Cold" ) ) Counting and Summarising proc freq data = old_data ; table job_type ; run; old_data %>% count( job_type ) proc freq data = old_data ; table job_type*region ; run; old_data %>% count( job_type , region ) proc summary data = old_data nway ; class job_type region ; output out = new_data ; run; new_data <- old_data %>% group_by( job_type , region ) %>% summarise( Count = n( ) ) proc summary data = old_data nway ; class job_type region ; var salary ; output out = new_data sum( salary ) = total_salaries ; run; new_data <- old_data %>% group_by( job_type , region ) %>% summarise( total_salaries = sum( salary ) , Count = n( ) ) Combining datasets data new_data ; set data_1 data_2 ; run; new_data <- bind_rows( data_1 , data_2 ) data new_data ; merge data_1 (in= in_1) data_2 ; by id ; if in_1 ; run; new_data <- left_join( data_1 , data_2 , by = "id") C.f. rbind( ) which produces error if columns are not identical Lots of summary functions in both languages Swap summarise( ) for mutate( ) to add summary data to original data Equivalent without nway not trivially produced For percent, add: %>% mutate(percent = n*100/sum(n)) C.f. full_join( ) , right_join( ) , inner_join( ) Could use slice(n( )) for last Note order differs C.f. contains( ) , ends_with( ) Some plotting in R ggplot( my_data , aes( year , sales ) ) + geom_point( ) + geom_line( ) ggplot( my_data , aes( year , sales ) ) + geom_point( ) + geom_line( ) + ylim(0, 40) + labs(x = "" , y = "Sales per year") ggplot(my_data, aes( year, sales, colour = dept) ) + geom_point( ) + geom_line( ) ggplot( my_data , aes( year, sales, fill = dept) ) + geom_col( ) ggplot( my_data , aes( year, sales, fill = dept) ) + geom_col( position = "dodge" ) + coord_flip( ) Note ācolourā for lines & points, āfillā for shapes C.f. position = "fill" for 100% stacked bars/cols CC BY SA Brendan OāDowd ā¢ brendanjodowd@gmail.com ā¢ Updated 2021-09