Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

R Programming Cheat Sheet, Just the Basics, Cheat Sheet of Advanced Computer Programming

A cheat sheet for beginners to R language programming. Learn the basic functions, syntax and paradigm of R programming

Typology: Cheat Sheet

2020/2021
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 04/27/2021

ehimay
ehimay 🇺🇸

4.7

(20)

20 documents

Partial preview of the text

Download R Programming Cheat Sheet, Just the Basics and more Cheat Sheet Advanced Computer Programming in PDF only on Docsity! General Data StructureS ManipulatinG StrinGS R Programming Cheat Sheet juSt the baSicS Putting Together Strings paste('string1', 'string2', sep = '/') # separator ('sep') is a space by default paste(c('1', '2'), collapse = '/') # returns '1/2' Split String stringr::str_split(string = v1, pattern = '-') # returns a list Get Substring stringr::str_sub(string = v1, start = 1, end = 3) Match String isJohnFound <- stringr::str_ detect(string = df1$col1, pattern = ignore.case('John')) # returns True/False if John was found df1[isJohnFound, c('col1', ...)] Data typeS • R version 3.0 and greater adds support for 64 bit integers • R is case sensitive • R index starts from 1 HELP help(functionName) or ?functionName Help Home Page help.start() Special Character Help help('[') Search Help help.search(..)or ??.. Search Function - with Partial Name apropos('mea') See Example(s) example(topic) ObjEcts in current environment Display Object Name objects() or ls() Remove Object rm(object1, object2,..) Notes: 1. .name starting with a period are accessible but invisible, so they will not be found by ‘ls’ 2. To guarantee memory removal, use ‘gc’, releasing unused memory to the OS. R performs automatic ‘gc’ periodically symbOL NamE ENvirONmENt • If multiple packages use the same function name the function that the package loaded the last will get called. • To avoid this precede the function with the name of the package. e.g. packageName::functionName(..) Library Only trust reliable R packages i.e., 'ggplot2' for plotting, 'sp' for dealing spatial data, 'reshape2', 'survival', etc. Load Package library(packageName)or require(packageName) Unload Package detach(packageName) Note: require() returns the status(True/False) vEctOr • Group of elements of the SAME type • R is a vectorized language, operations are applied to each element of the vector automatically • R has no concept of column vectors or row vectors • Special vectors: letters and LETTERS, that contain lower-case and upper-case letters Create Vector v1 <- c(1, 2, 3) Get Length length(v1) Check if All or Any is True all(v1); any(v1) Integer Indexing v1[1:3]; v1[c(1,6)] Boolean Indexing v1[is.na(v1)] <- 0 Naming c(first = 'a', ..)or names(v1) <- c('first', ..) FactOr • as.factor(v1) gets you the levels which is the number of unique values • Factors can reduce the size of a variable because they only store unique values, but could be buggy if not used properly List Store any number of items of ANY type Create List list1 <- list(first = 'a', ...) Create Empty List vector(mode = 'list', length = 3) Get Element list1[[1]] or list1[['first']] Append Using Numeric Index list1[[6]] <- 2 Append Using Name list1[['newElement']] <- 2 Note: repeatedly appending to list, vector, data.frame etc. is expensive, it is best to create a list of a certain size, then fill it. data.FramE • Each column is a variable, each row is an observation • Internally, each column is a vector • idata.frame is a data structure that creates a reference to a data.frame, therefore, no copying is performed Create Data Frame df1 <- data.frame(col1 = v1, col2 = v2, v3) Dimension nrow(df1); ncol(df1); dim(df1) Get/Set Column Names names(df1) names(df1) <- c(...) Get/Set Row Names rownames(df1) rownames(df1) <- c(...) Preview head(df1, n = 10); tail(...) Get Data Type class(df1) # is data.frame Index by Column(s) df1['col1']or df1[1];† df1[c('col1', 'col3')] or df1[c(1, 3)] Index by Rows and Columns df1[c(1, 3), 2:3] # returns data from row 1 & 3, columns 2 to 3 † Index method: df1$col1 or df1[, 'col1'] or df1[, 1] returns as a vector. To return single column Check data type: class(variable) FOur basic data tyPEs 1. Numeric - includes float/double, int, etc. is.numeric(variable) 2. Character(string) nchar(variable) # length of a character or numeric 3. Date/POSIXct • Date: stores just a date. In numeric form, number of days since 1/1/1970 (see below). date1 <- as.Date('2012-06-28'), as.numeric(date1) • POSIXct: stores a date and time. In numeric form, number of seconds since 1/1/1970. date2 <- as.POSIXct('2012-06-28 18:00') Note: Use 'lubridate' and 'chron' packages to work with Dates 4. Logical • (TRUE = 1, FALSE = 0) • Use ==/!= to test equality and inequality as.numeric(TRUE) => 1 data.frame while using single-square brackets, use ‘drop’: df1[, 'col1', drop = FALSE] data.tabLE What is a data.table • Extends and enhances the functionality of data.frames Differences: data.table vs. data.frame • By default data.frame turns character data into factors, while data.table does not • When you print data.frame data, all data prints to the console, with a data.table, it intelligently prints the first and last five rows • Key Difference: Data.tables are fast because they have an index like a database. i.e., this search, dt1$col1 > number, does a sequential scan (vector scan). After you create a key for this, it will be much faster via binary search. Create data.table from data.frame data.table(df1) Index by Column(s)* dt1[, 'col1', with = FALSE] or dt1[, list(col1)] Show info for each data.table in memory (i.e., size, ...) tables() Show Keys in data.table key(dt1) Create index for col1 and reorder data according to col1 setkey(dt1, col1) Use Key to Select Data dt1[c('col1Value1','col1Value2'), ] Multiple Key Select dt1[J('1', c('2', '3')), ] Aggregation** dt1[, list(col1 = mean(col1)), by = col2] dt1[, list(col1 = mean(col1), col2Sum = sum(col2)), by = list(col3, col4)] * Accessing columns must be done via list of actual names, not as characters. If column names are characters, then "with" argument should be set to FALSE. ** Aggregate and d*ply functions will work, but built-in aggregation functionality of data table is faster matrix • Similar to data.frame except every element must be the SAME type, most commonly all numerics • Functions that work with data.frame should work with matrix as well Create Matrix matrix1 <- matrix(1:10, nrow = 5), # fillsrows 1 to 5, column 1 with 1:5, and column 2 with 6:10 Matrix Multiplication matrix1 %*% t(matrix2) # where t() is transpose array • Multidimensional vector of the SAME type • array1 <- array(1:12, dim = c(2, 3, 2)) • Using arrays is not recommended • Matrices are restricted to two dimensions while array can have any dimension
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved