Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data analysis with R - Some Simple Commands | STATS 0110A, Study notes of Statistics

Material Type: Notes; Class: APPLIED STATISTICS; Subject: Statistics; University: University of California - Los Angeles; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-ye9
koofers-user-ye9 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Data analysis with R - Some Simple Commands | STATS 0110A and more Study notes Statistics in PDF only on Docsity! University of California, Los Angeles Department of Statistics Statistics 110A Instructor: Nicolas Christou Data analysis with R - Some simple commands When you are in R, the command line begins with > To read data from a website: > site="http://www.stat.ucla.edu/~nchristo/body_fat.txt" > data <- read.table(file=site, header=T) Another way to read data from a website is the following: data <- read.table("http://www.stat.ucla.edu/~nchristo/body_fat.txt", header=TRUE) This file contains data on percentage of body fat determined by underwater weighing and various body circumference measurements for 251 men. Here is the variable description: Variable Description x1 Density determined from underwater weighing x2 Percent body fat from Siri’s (1956) equation x3 Age (years) x4 Weight (lbs) x5 Height (inches) x6 Neck circumference (cm) x7 Chest circumference (cm) x8 Abdomen 2 circumference (cm) x9 Hip circumference (cm) x10 Thigh circumference (cm) x11 Knee circumference (cm) x12 Ankle circumference (cm) x13 Biceps (extended) circumference (cm) x14 Forearm circumference (cm) x15 Wrist circumference (cm) If the data file is on your computer (e.g. on your desktop), first you need to change the working directory by clicking on Misc at the top of your screen and then read the data as follows: > data <- read.table("filename.txt", header=T) Note: the expression <- is an assignment operator. The result of a read.table is a data frame (it looks like a matrix). 1 Useful commands: • Extracting one variable from data (e.g. the second variable): > data[,2] • Another way to extract one variable : > data$x2 • Similarly if we want to access a particular row in our data (e.g. first row): > data[1,] • To list all the data simply type: > data • To compute the mean of all the variables in the data set: > mean(data) • To compute the mean of just one variable: > mean(data$x2) • To compute the mean of variables 2 and 3: > mean(data[,c(2,3)]) • To compute the variance of one variable: > var(data$x2) • To compute summary statistics for all the variables: > summary(data). • To construct stem-and-leaf plot, histogram, boxplot: > stem(data$x2) > boxplot(data$x2) > hist(data$x2) • To plot variable x2 against variable x10: > plot(datax2, datax10) • And you can give names to the axes and to your plot: > plot(data$x2,data$x10, main="Scatterplot of percent body fat against thigh circumference", xlab="Percent body fat", ylab="Thigh circumference") • To save a plot as a pdf file under the working directory (e.g. your desktop): > pdf("box_x2.pdf") > boxplot(x2) > dev.off() If you want to read more about a specific command (for example the histogram) at the command line you type the following: > ?hist > ?boxplot 2 Exercise: a. You can access these data at > soil <- read.table("http://www.stat.ucla.edu/~nchristo/statistics13/ soil.txt", header=TRUE) b. Construct the stem-and-leaf plot, histrogram, and boxplot for each one of the two variables (lead and zinc), and compute the summary statistics. What do you observe? c. Transform the data in order to produce a symmetrical histrogram. Here is what you can do: > log_lead <- log(soil$lead) > log_zinc <- log(soil$zinc) Construct the stem-and-leaf plot, histrogram, and boxplot for each one of the new variables (log lead and log zinc), and compute the summary statistics. What do you observe now. Here is a side by side boxplot of the variables lead and zinc. First create a new data frame with only the variables lead and zinc: soil_1 <- soil[,3:4] Then you can construct a side by side boxplots of lead and zinc using: > boxplot(soil_1) ● ● ● ● ● ● ● ● ● ● ● lead zinc 0 50 0 10 00 15 00 5 Other useful commands in R: • To create variables in R use <- or the equal sign =. Here are some examples: > x <- c(1,2,3,4,5) > y < c(10,20,30,40,50) > q <- cbind(y,z) And here is what you get: > x [1] 1 2 3 4 5 > q y z [1,] 1 10 [2,] 2 20 [3,] 3 30 [4,] 4 40 [5,] 5 50 • To rename variables: > names(q) <- c("a", "b") > q a b 1 1 10 2 2 20 3 3 30 4 4 40 5 5 50 6
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved