Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Tidyverse, dplyr R Cheat Sheet, Cheat Sheet of Advanced Computer Programming

Palomar College Advanced Computer Programming

R For Data Science Cheat Sheet, Tidyverse for Beginners

Typology: Cheat Sheet

2020/2021

Uploaded on 04/26/2021

oliver97 🇺🇸

4.4

(44)

94 documents

Partial preview of the text

Download Tidyverse, dplyr R Cheat Sheet and more Cheat Sheet Advanced Computer Programming in PDF only on Docsity! R For Data Science Cheat Sheet Tidyverse for Beginners Tidyverse The tidyverse is a powerful collection of R packages that are actually data tools for transforming and visualizing data. All packages of the tidyverse share an underlying philosophy and common APIs. The core packages are: • ggplot2, which implements the grammar of graphics. You can use it to visualize your data. • dplyr is a grammar of data manipulation. You can use it to solve the most common data manipulation challenges. • tidyr helps you to create tidy data or data where each variable is in a column, each observation is a row end each value is a cell. • readr is a fast and friendly way to read rectangular data. • purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. • tibble is a modern re-imaginging of the data frame. • stringr provides a cohesive set of functions designed to make working with strings as easy as posssible • forcats provide a suite of useful tools that solve common problems with factors. You can install the complete tidyverse with: Then, load the core tidyverse and make it available in your current R session by running: Note: there are many other tidyverse packages with more specialised usage. They are not loaded automatically with library(tidyverse), so you’ll need to load each one with its own call to library(). ggplot2 > install.packages("tidyverse") > iris %>% Select iris data of species filter(Species=="virginica") "virginica" > iris %>% Select iris data of species filter(Species=="virginica", "virginica" and sepal length Sepal.Length > 6) greater than 6. dplyr Filter > library(tidyverse) Useful Functions Arrange Mutate Summarize > tidyverse_conflicts() Conflicts between tidyverse and other packages > tidyverse_deps() List all tidyverse dependencies > tidyverse_logo() Get tidyverse logo, using ASCII or unicode characters > tidyverse_packages() List all tidyverse packages > tidyverse_update() Update tidyverse packages Loading in the data > library(datasets) Load the datasets package > library(gapminder) Load the gapminder package > attach(iris) Attach iris data to the R search path filter() allows you to select a subset of rows in a data frame. > iris %>% Sort in ascending order of arrange(Sepal.Length) sepal length > iris %>% Sort in descending order of arrange(desc(Sepal.Length)) sepal length arrange() sorts the observations in a dataset in ascending or descending order based on one of its variables. > iris %>% Filter for species "virginica" filter(Species=="virginica") %>% then arrange in descending arrange(desc(Sepal.Length)) order of sepal length Combine multiple dplyr verbs in a row with the pipe operator %>%: mutate() allows you to update or create new columns of a data frame. > iris %>% Change Sepal.Length to be mutate(Sepal.Length=Sepal.Length*10) in millimeters > iris %>% Create a new column mutate(SLMm=Sepal.Length*10) called SLMm Combine the verbs filter(), arrange(), and mutate(): > iris %>% filter(Species=="Virginica") %>% mutate(SLMm=Sepal.Length*10) %>% arrange(desc(SLMm)) > iris %>% Summarize to find the summarize(medianSL=median(Sepal.Length)) median sepal length > iris %>% Filter for virginica then filter(Species=="virginica") %>% summarize the median summarize(medianSL=median(Sepal.Length)) sepal length summarize() allows you to turn many observations into a single data point. > iris %>% filter(Species=="virginica") %>% summarize(medianSL=median(Sepal.Length), maxSL=max(Sepal.Length)) You can also summarize multiple variables at once: group_by() allows you to summarize within groups instead of summarizing the entire dataset: > iris %>% Find median and max group_by(Species) %>% sepal length of each summarize(medianSL=median(Sepal.Length), species maxSL=max(Sepal.Length)) > iris %>% Find median and max filter(Sepal.Length>6) %>% petal length of each group_by(Species) %>% species with sepal summarize(medianPL=median(Petal.Length), length > 6 maxPL=max(Petal.Length)) Scatter plot > iris_small <- iris %>% filter(Sepal.Length > 5) > ggplot(iris_small, aes(x=Petal.Length, Compare petal y=Petal.Width)) + width and length geom_point() Scatter plots allow you to compare two variables within your data. To do this with ggplot2, you use geom_point() Additional Aesthetics > ggplot(iris_small, aes(x=Petal.Length, y=Petal.Width, color=Species)) + geom_point() • Color • Size > ggplot(iris_small, aes(x=Petal.Length, y=Petal.Width, color=Species, size=Sepal.Length)) + geom_point() Faceting > ggplot(iris_small, aes(x=Petal.Length, y=Petal.Width)) + geom_point()+ facet_wrap(~Species) Line Plots Bar Plots Histograms Box Plots > by_year <- gapminder %>% group_by(year) %>% summarize(medianGdpPerCap=median(gdpPercap)) > ggplot(by_year, aes(x=year, y=medianGdpPerCap))+ geom_line()+ expand_limits(y=0) > by_species <- iris %>% filter(Sepal.Length>6) %>% group_by(Species) %>% summarize(medianPL=median(Petal.Length)) > ggplot(by_species, aes(x=Species, y=medianPL)) + geom_col() > ggplot(iris_small, aes(x=Petal.Length))+ geom_histogram() > ggplot(iris_small, aes(x=Species, y=Sepal.Width))+ geom_boxplot()

Documents

questions

Tidyverse, dplyr R Cheat Sheet, Cheat Sheet of Advanced Computer Programming

Related documents

Partial preview of the text