Download R Programming Cheat Sheet, Just the Basics and more Cheat Sheet Advanced Computer Programming in PDF only on Docsity! General Data StructureS ManipulatinG StrinGS R Programming Cheat Sheet juSt the baSicS Putting Together Strings paste('string1', 'string2', sep = '/') # separator ('sep') is a space by default paste(c('1', '2'), collapse = '/') # returns '1/2' Split String stringr::str_split(string = v1, pattern = '-') # returns a list Get Substring stringr::str_sub(string = v1, start = 1, end = 3) Match String isJohnFound <- stringr::str_ detect(string = df1$col1, pattern = ignore.case('John')) # returns True/False if John was found df1[isJohnFound, c('col1', ...)] Data typeS • R version 3.0 and greater adds support for 64 bit integers • R is case sensitive • R index starts from 1 HELP help(functionName) or ?functionName Help Home Page help.start() Special Character Help help('[') Search Help help.search(..)or ??.. Search Function - with Partial Name apropos('mea') See Example(s) example(topic) ObjEcts in current environment Display Object Name objects() or ls() Remove Object rm(object1, object2,..) Notes: 1. .name starting with a period are accessible but invisible, so they will not be found by ‘ls’ 2. To guarantee memory removal, use ‘gc’, releasing unused memory to the OS. R performs automatic ‘gc’ periodically symbOL NamE ENvirONmENt • If multiple packages use the same function name the function that the package loaded the last will get called. • To avoid this precede the function with the name of the package. e.g. packageName::functionName(..) Library Only trust reliable R packages i.e., 'ggplot2' for plotting, 'sp' for dealing spatial data, 'reshape2', 'survival', etc. Load Package library(packageName)or require(packageName) Unload Package detach(packageName) Note: require() returns the status(True/False) vEctOr • Group of elements of the SAME type • R is a vectorized language, operations are applied to each element of the vector automatically • R has no concept of column vectors or row vectors • Special vectors: letters and LETTERS, that contain lower-case and upper-case letters Create Vector v1 <- c(1, 2, 3) Get Length length(v1) Check if All or Any is True all(v1); any(v1) Integer Indexing v1[1:3]; v1[c(1,6)] Boolean Indexing v1[is.na(v1)] <- 0 Naming c(first = 'a', ..)or names(v1) <- c('first', ..) FactOr • as.factor(v1) gets you the levels which is the number of unique values • Factors can reduce the size of a variable because they only store unique values, but could be buggy if not used properly List Store any number of items of ANY type Create List list1 <- list(first = 'a', ...) Create Empty List vector(mode = 'list', length = 3) Get Element list1[[1]] or list1[['first']] Append Using Numeric Index list1[[6]] <- 2 Append Using Name list1[['newElement']] <- 2 Note: repeatedly appending to list, vector, data.frame etc. is expensive, it is best to create a list of a certain size, then fill it. data.FramE • Each column is a variable, each row is an observation • Internally, each column is a vector • idata.frame is a data structure that creates a reference to a data.frame, therefore, no copying is performed Create Data Frame df1 <- data.frame(col1 = v1, col2 = v2, v3) Dimension nrow(df1); ncol(df1); dim(df1) Get/Set Column Names names(df1) names(df1) <- c(...) Get/Set Row Names rownames(df1) rownames(df1) <- c(...) Preview head(df1, n = 10); tail(...) Get Data Type class(df1) # is data.frame Index by Column(s) df1['col1']or df1[1];† df1[c('col1', 'col3')] or df1[c(1, 3)] Index by Rows and Columns df1[c(1, 3), 2:3] # returns data from row 1 & 3, columns 2 to 3 † Index method: df1$col1 or df1[, 'col1'] or df1[, 1] returns as a vector. To return single column Check data type: class(variable) FOur basic data tyPEs 1. Numeric - includes float/double, int, etc. is.numeric(variable) 2. Character(string) nchar(variable) # length of a character or numeric 3. Date/POSIXct • Date: stores just a date. In numeric form, number of days since 1/1/1970 (see below). date1 <- as.Date('2012-06-28'), as.numeric(date1) • POSIXct: stores a date and time. In numeric form, number of seconds since 1/1/1970. date2 <- as.POSIXct('2012-06-28 18:00') Note: Use 'lubridate' and 'chron' packages to work with Dates 4. Logical • (TRUE = 1, FALSE = 0) • Use ==/!= to test equality and inequality as.numeric(TRUE) => 1 data.frame while using single-square brackets, use ‘drop’: df1[, 'col1', drop = FALSE] data.tabLE What is a data.table • Extends and enhances the functionality of data.frames Differences: data.table vs. data.frame • By default data.frame turns character data into factors, while data.table does not • When you print data.frame data, all data prints to the console, with a data.table, it intelligently prints the first and last five rows • Key Difference: Data.tables are fast because they have an index like a database. i.e., this search, dt1$col1 > number, does a sequential scan (vector scan). After you create a key for this, it will be much faster via binary search. Create data.table from data.frame data.table(df1) Index by Column(s)* dt1[, 'col1', with = FALSE] or dt1[, list(col1)] Show info for each data.table in memory (i.e., size, ...) tables() Show Keys in data.table key(dt1) Create index for col1 and reorder data according to col1 setkey(dt1, col1) Use Key to Select Data dt1[c('col1Value1','col1Value2'), ] Multiple Key Select dt1[J('1', c('2', '3')), ] Aggregation** dt1[, list(col1 = mean(col1)), by = col2] dt1[, list(col1 = mean(col1), col2Sum = sum(col2)), by = list(col3, col4)] * Accessing columns must be done via list of actual names, not as characters. If column names are characters, then "with" argument should be set to FALSE. ** Aggregate and d*ply functions will work, but built-in aggregation functionality of data table is faster matrix • Similar to data.frame except every element must be the SAME type, most commonly all numerics • Functions that work with data.frame should work with matrix as well Create Matrix matrix1 <- matrix(1:10, nrow = 5), # fillsrows 1 to 5, column 1 with 1:5, and column 2 with 6:10 Matrix Multiplication matrix1 %*% t(matrix2) # where t() is transpose array • Multidimensional vector of the SAME type • array1 <- array(1:12, dim = c(2, 3, 2)) • Using arrays is not recommended • Matrices are restricted to two dimensions while array can have any dimension