Download Data Transformation with Dplyr Cheat Sheet and more Cheat Sheet Data Structures and Algorithms in PDF only on Docsity! w Summarise Cases group_by(.data, ..., add = FALSE) Returns copy of table
grouped by … g_iris <- group_by(iris, Species) ungroup(x, …) Returns ungrouped copy
of table. ungroup(g_iris) wwwww Use group_by() to create a "grouped" copy of a table.
dplyr functions will manipulate each "group" separately and then combine the results. mtcars %>% group_by(cyl) %>% summarise(avg = mean(mpg)) These apply summary functions to columns to create a new table of summary statistics. Summary functions take vectors as input and return one value (see back). VARIATIONS summarise_all() - Apply funs to every column. summarise_at() - Apply funs to specific columns. summarise_if() - Apply funs to all cols of one type. ww ww summarise(.data, …)
Compute table of summaries.
summarise(mtcars, avg = mean(mpg)) count(x, ..., wt = NULL, sort = FALSE)
Count number of rows in each group defined by the variables in … Also tally().
count(iris, Species) RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2019-08 Each observation, or case, is in its own row Each variable is in its own column & dplyr functions work with pipes and expect tidy data. In tidy data: pipes x %>% f(y) becomes f(x, y) filter(.data, …) Extract rows that meet logical criteria. filter(iris, Sepal.Length > 7) distinct(.data, ..., .keep_all = FALSE) Remove rows with duplicate values.
distinct(iris, Species) sample_frac(tbl, size = 1, replace = FALSE, weight = NULL, .env = parent.frame()) Randomly select fraction of rows.
sample_frac(iris, 0.5, replace = TRUE) sample_n(tbl, size, replace = FALSE, weight = NULL, .env = parent.frame()) Randomly select size rows. sample_n(iris, 10, replace = TRUE) slice(.data, …) Select rows by position. slice(iris, 10:15) top_n(x, n, wt) Select and order top n entries (by group if grouped data). top_n(iris, 5, Sepal.Width) Row functions return a subset of rows as a new table. See ?base::Logic and ?Comparison for help. > >= !is.na() ! & < <= is.na() %in% | xor() arrange(.data, …) Order rows by values of a column or columns (low to high), use with desc() to order from high to low. arrange(mtcars, mpg) arrange(mtcars, desc(mpg)) add_row(.data, ..., .before = NULL, .after = NULL) Add one or more rows to a table. add_row(faithful, eruptions = 1, waiting = 1) Group Cases Manipulate Cases EXTRACT VARIABLES ADD CASES ARRANGE CASES Logical and boolean operators to use with filter() Column functions return a set of columns as a new vector or table. contains(match) ends_with(match) matches(match) :, e.g. mpg:cyl -, e.g, -Species num_range(prefix, range) one_of(…) starts_with(match) pull(.data, var = -1) Extract column values as a vector. Choose by name or index. pull(iris, Sepal.Length) Manipulate Variables Use these helpers with select (), e.g. select(iris, starts_with("Sepal")) These apply vectorized functions to columns. Vectorized funs take vectors as input and return vectors of the same length as output (see back). mutate(.data, …)
Compute new column(s). mutate(mtcars, gpm = 1/mpg) transmute(.data, …)
Compute new column(s), drop others. transmute(mtcars, gpm = 1/mpg) mutate_all(.tbl, .funs, …) Apply funs to every column. Use with funs(). Also mutate_if().
mutate_all(faithful, funs(log(.), log2(.))) mutate_if(iris, is.numeric, funs(log(.))) mutate_at(.tbl, .cols, .funs, …) Apply funs to specific columns. Use with funs(), vars() and the helper functions for select().
mutate_at(iris, vars( -Species), funs(log(.))) add_column(.data, ..., .before = NULL, .after = NULL) Add new column(s). Also add_count(), add_tally(). add_column(mtcars, new = 1:32) rename(.data, …) Rename columns.
rename(iris, Length = Sepal.Length) MAKE NEW VARIABLES EXTRACT CASES wwww wwww wwww wwww wwww wwww www ww ww wwww ww wwww w wwww summary function vectorized function Data Transformation with dplyr : : CHEAT SHEET A B CA B C select(.data, …) Extract columns as a table. Also select_if(). select(iris, Sepal.Length, Species)www dplyr