Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Transformation with Dplyr Cheat Sheet, Cheat Sheet of Data Structures and Algorithms

Cheat sheet on Data Transformation with Dplyr: Manipulate Cases and Variables, Vector Functions, Row Names.

Typology: Cheat Sheet

2019/2020
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 10/09/2020

nicoth
nicoth 🇺🇸

4.3

(20)

18 documents

Partial preview of the text

Download Data Transformation with Dplyr Cheat Sheet and more Cheat Sheet Data Structures and Algorithms in PDF only on Docsity! w Summarise Cases group_by(.data, ..., add = FALSE) Returns copy of table 
 grouped by … g_iris <- group_by(iris, Species) ungroup(x, …) Returns ungrouped copy 
 of table. ungroup(g_iris) wwwww Use group_by() to create a "grouped" copy of a table. 
 dplyr functions will manipulate each "group" separately and then combine the results. mtcars %>% group_by(cyl) %>% summarise(avg = mean(mpg)) These apply summary functions to columns to create a new table of summary statistics. Summary functions take vectors as input and return one value (see back). VARIATIONS summarise_all() - Apply funs to every column. summarise_at() - Apply funs to specific columns. summarise_if() - Apply funs to all cols of one type. ww ww summarise(.data, …)
 Compute table of summaries. 
 summarise(mtcars, avg = mean(mpg)) count(x, ..., wt = NULL, sort = FALSE)
 Count number of rows in each group defined by the variables in … Also tally().
 count(iris, Species) RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2019-08 Each observation, or case, is in its own row Each variable is in its own column & dplyr functions work with pipes and expect tidy data. In tidy data: pipes x %>% f(y) becomes f(x, y) filter(.data, …) Extract rows that meet logical criteria. filter(iris, Sepal.Length > 7) distinct(.data, ..., .keep_all = FALSE) Remove rows with duplicate values. 
 distinct(iris, Species) sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = parent.frame()) Randomly select fraction of rows. 
 sample_frac(iris, 0.5, replace = TRUE) sample_n(tbl, size, replace = FALSE, weight = NULL, .env = parent.frame()) Randomly select size rows. sample_n(iris, 10, replace = TRUE) slice(.data, …) Select rows by position. slice(iris, 10:15) top_n(x, n, wt) Select and order top n entries (by group if grouped data). top_n(iris, 5, Sepal.Width) Row functions return a subset of rows as a new table. See ?base::Logic and ?Comparison for help. > >= !is.na() ! & < <= is.na() %in% | xor() arrange(.data, …) Order rows by values of a column or columns (low to high), use with desc() to order from high to low. arrange(mtcars, mpg) arrange(mtcars, desc(mpg)) add_row(.data, ..., .before = NULL, .after = NULL) Add one or more rows to a table. add_row(faithful, eruptions = 1, waiting = 1) Group Cases Manipulate Cases EXTRACT VARIABLES ADD CASES ARRANGE CASES Logical and boolean operators to use with filter() Column functions return a set of columns as a new vector or table. contains(match) ends_with(match) matches(match) :, e.g. mpg:cyl -, e.g, -Species num_range(prefix, range) one_of(…) starts_with(match) pull(.data, var = -1) Extract column values as a vector. Choose by name or index. pull(iris, Sepal.Length) Manipulate Variables Use these helpers with select (), e.g. select(iris, starts_with("Sepal")) These apply vectorized functions to columns. Vectorized funs take vectors as input and return vectors of the same length as output (see back). mutate(.data, …) 
 Compute new column(s). mutate(mtcars, gpm = 1/mpg) transmute(.data, …)
 Compute new column(s), drop others. transmute(mtcars, gpm = 1/mpg) mutate_all(.tbl, .funs, …) Apply funs to every column. Use with funs(). Also mutate_if().
 mutate_all(faithful, funs(log(.), log2(.))) mutate_if(iris, is.numeric, funs(log(.))) mutate_at(.tbl, .cols, .funs, …) Apply funs to specific columns. Use with funs(), vars() and the helper functions for select().
 mutate_at(iris, vars( -Species), funs(log(.))) add_column(.data, ..., .before = NULL, .after = NULL) Add new column(s). Also add_count(), add_tally(). add_column(mtcars, new = 1:32) rename(.data, …) Rename columns.
 rename(iris, Length = Sepal.Length) MAKE NEW VARIABLES EXTRACT CASES wwww wwww wwww wwww wwww wwww www ww ww wwww ww wwww w wwww summary function vectorized function Data Transformation with dplyr : : CHEAT SHEET A B CA B C select(.data, …) Extract columns as a table. Also select_if(). select(iris, Sepal.Length, Species)www dplyr
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved