Prepara i tuoi esami
Ottieni punti
Guide e consigli

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Guide e consigli

Vendi su Docsity

Accedi

Registrati

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Cerca documenti

Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity

Cerca documenti Store

I migliori documenti in vendita da studenti che hanno completato gli studi

Video Corsi

Preparati con lezioni e prove svolte basate sui programmi universitari!

Quiz

Rispondi a reali domande d’esame e scopri la tua preparazione

Cerca tra tutte le risorse di studio

Docsity AINEW

Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali

Maturità 2024

Studia con prove svolte, tesine e consigli utili

Esplora domande

Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Condividi documenti

20 Punti

Per ogni documento caricato

Rispondi alle domande

5 Punti

per ogni risposta data (max 1 al giorno)

Tutti i modi per ottenere punti gratis

Ottieni punti subito

Scegli un piano Premium con tutti i punti di cui hai bisogno

Opportunità di studio

Cerca offerte formativeNEW

Entra in contatto con le migliori università del mondo e scegli il tuo percorso di studi

Community

Chiedi alla community

Chiedi aiuto alla community e sciogli i tuoi dubbi legati allo studio

Classifica università

Scopri le migliori università del tuo paese secondo gli utenti Docsity

Guide Gratuite

I nostri eBook salva studente

Scarica gratuitamente le nostre guide sulle tecniche di studio, metodi per gestire l'ansia, dritte per la tesi realizzati da tutor Docsity

Dal blog

Lavoro & Stage

Master e dottorati

Vai al blog

DATA ANALYSIS AND BIG DATA LAB-formulario, Formulari di Database Distribuiti

Università degli Studi di Brescia (UNIBS)Database Distribuiti

Prof. Enrico Ripamonti

formulario per la preparazione dell'esame di DATA ANALYSIS AND BIG DATA LAB (A.A. 2022) tramite l'utilizzo del programma STUDIO R

Tipologia: Formulari

2020/2021

In vendita dal 28/06/2023

klevisa.ba5 🇮🇹

12 documenti

1 / 8

Documenti correlati

Data Analysis & Visualization

(2)

[Formulario] Basi di dati II (Data science & tecnology)

Formulario di Business Data Science

Data & trend analysis

Commenti per il corso di Data analysis and big data lab

Appunti Data Analysis Laboratory

Riassunto + LABS data analysis

codici completi per Rstudio per Data analysis and big data lab

APPUNTI LABORATORIO DI DIGITAL STRATEGY E DATA INTELLIGENCE ANALYSIS (CORSO BASE)

Dispensa: Lab Of Data Analysis For Economics And Political Science

Slides Data Analysis

esercitazione 26/3 data analysis

DISPENSE DATA ANALYSIS

Appunti Data Analysis

Appunti data analysis

Appunti di Data Analysis

Appunti Data analysis

DATA ANALYSIS CAPITOLO 8

REPORT DATA ANALYSIS

(1)

Data Analysis -Nieddu

(2)

Data Analysis - clemi a/b/c/d

Appunti Data Analysis

Esame di Data Analysis

(1)

Appunti di Data Analysis

Esame di Data Analysis

Appunti Data Analysis

Data analysis appunti completo

Analysis of a Complex Data Set

Appunti completi data analysis

Anteprima parziale del testo

Scarica DATA ANALYSIS AND BIG DATA LAB-formulario e più Formulari in PDF di Database Distribuiti solo su Docsity! FORMULE Import data set: se txt: data <-read.delim(“data.txt”,header=TRUE,sep=””,row.names”text”) se csv: data <- read.csv(“data.csv”,header=TRUE) se tsv : data <- read_tsv(data.tsv) Create a tibble: library(tidyverse) library(tibble) str(data) as_tibble(data)< Use as_tibble(): external file and you want to work with that with tidyverse, you may transform it as tibble data.t <- as_tibble(data) str(data.t) you can explicitly print() the data frame and control the number of rows (n) nycflights13::flights %>% print(n = 15, width = Inf) Subsetting 1. df <- tibble(x = runif(5), y = rnorm(5)) 2. str(df) 3. print(df) df$x < Extract by name If you have activated the pipe, you can extract the elements, but you'll need to use the special placeholder df %>% .$x x <- subset(citydat,Year==1,select=c(Population)) x crea un seubset #merge 2 tibble Data.tt <- data %>% left_join(data2,by=”variable of text”) data.tt Data.tt <- as.data.frame(data.tt) Provide a graphival representationof the distribution: #crea regression model delle variabili ~ = alt+126 model.lm <- lm( x ~y, data=”boston”, filter(reg.ttt,"ID">=40)) summary(model.lm) model.lm <-lm(x~.,data=”boston”) crea una regressione di x e tutte le variabili rimanenti compare models fit1 <- lm(y ~ x1 + x2 + x3 + x4, data=mydata) < multiple regression fit2 <- lm(y ~ x1 + x2) anova(fit1, fit2) Trovare i valori nulli Is.na(“variabile da verificare”) Data[“variabile da verificare”][is.na(data[“variabile da verificare”])] <- “nuovo valore da assegnare/0” ggplot ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy) size = class)/color=class/shape=class) You can split your plot into facets, subplots that each display one subset of the data to facet your plot by a single variable, use facet_wrap() ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_wrap(~ class, nrow = 2) To change the geom in your plot, change the type of the geom function FORMULE ggplot(data = mpg) + geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv), show.legend = FALSE) To display multiple geoms in the same plot, add multiple geom functions to ggplot(): ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + geom_smooth(mapping = aes(x = displ, y = hwy)) #You can color a bar chart using either the colour aesthetic, or fill: #The colour option colors the edge ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, colour = cut)) #line chart: geom_line() #boxplot: geom_boxplot() #histogram: geom_histogram() #area chart: geom_area() Data transformation filter() #Pick observations by their values arrange() #Reorder the rows select() #Pick variables by their names mutate() #Create new variables with functions of existing variables summarise() #Collapse many values down to a single summary group_by() #changes the scope of each function from operating on the entire dataset to t group-by-group filter filter(flights, month == 1, day == 1) It selects all flights on January 1st jan1 <- filter(flights, month == 1, day == 1) print(jan1)  filter(flights, month == 11 | month == 12) #Finds all flights that departed in November or Decembe #If you want to preserve missing values, ask for them explicitly 1. df <- tibble(x = c(1, NA, 3)) 2. filter(df, x > 1) 3. filter(df, is.na(x) | x > 1) arrange works similarly to filter() except that instead of selecting rows, it changes their order. arrange(flights, year, month, day) arrange(flights, desc(dep_delay)) <- Use desc() to re-order by a column in descending order Missing values are always sorted at the end: 1. df <- tibble(x = c(5, 2, NA)) 2. arrange(df, x) 3. arrange(df, desc(x)) Select allows you to rapidly zoom in on a useful subset using operations based on the names of the variables select() select(flights, year:day) < Select all columns between year and day (inclusive) ename(flights, tail_num = tailnum) < Rename: A variant of select() mutate() mutate(data, variabile da cambiare)< it's often useful to add new columns that are functions of existing columns. summarise() summarise(data ,variable =mean(variable2)) < collapses a data frame to a single row, creating a summary FORMULE es. b <- 0 for (i in 1:10){ b <- b + i} b #while loop while (test_expression){ statement} Examples: a <-0 while (a >0) { print(a) a <- a - 1} a #next if (test_condition) { next} #Examples: x <- 1:5 for (val in x) { if (val == 3){next } print(val)} #break if (test_expression) { break} exemples: a <- 5 for (i in 1:10){ if (i == 2) break a <- a - i } a #Repeat loop repeat {statement} Example: x <- 1 repeat {print(x) x = x+1 if (x == 6){ break }} #Functions Functions are used to logically break our code into simpler parts which become easy to maintain and understand. Exemple : my_func <- function(){ print('hello') } my_func Graphs review #We can put the two graphs in the same page using split split = c(x, y, nx, ny) #Plots barplot(table(X)) #Bar plot plot(table(Z)) #Stick plot pie(table(Y)) #Pie plot bubbleplot(table(X,Y)) #Bubbleplot plot(Z,W) #Scatter plot text(8,40,expression(y[i]==8.87 - 0.093*x[i])) #Boxplot boxplot( Sepal.Length ~ Species, main = "Iris", las = 1, layout=c(1,6),) #Q-Q plot qqnorm(x,col = 2) qqline(x,col = 4,lwd = 2) qqplot(x,rt(200,df=1)) #Histograms hist(Sepal.Length, main = "", density = FALSE, freq = TRUE) < con frequenze assolute) hist( W, breaks=3 ) A density line lines(density(Sepal.Length), col = "blue", lwd = 2) #Barplot barplot(VADeaths, beside = TRUE, las = 1, col = c("lightblue", "mistyrose", "lightcyan", ylim= c(0,120)) #Pairs Pairs is used to visualize the scatter plot of multiple variables taken by pairs pairs(iris[1:4], main =””,cex = 1.5, cex.labels = 2, font.labels = 2, pch = 21, ) sequence seq(-3,6,2, len=11) #sequence of values from -3 to 6, pace 2, length=11 which((x >= -1) & (x < 5)) # & means AND which((x < -2) | (x > 1)) # | means OR x[index] #extract the values cmp <- complex(real=1:10,imaginary=-1:9) cmp # defines a complex number FORMULE string <- c("gianni", "luca", "fabio") string defines a string bool <- c(TRUE, TRUE, FALSE, FALSE, FALSE) bool # defines boolean-type operators elle["NOMI"] It returns a list elle <- list(CPLX = cmp, NOMI = string, BOOL = bool, matrice = A) Lists How to visualize the structure of an object str(cmp) str(bool) str(string) str(matrix) str(elle) #It shows us the structure of "elle" #The linear model model <- lm(dist ~ speed, data=cars) summary(model) #Regression graph plot(cars) abline(model) #Q-Q plot: If quantiles are not gaussian, p-values are basically not valid str(model) plot(model$res) #It shows the residual, predict(model, newdata=data.frame(speed=9)) #Predicted values of the regression line for speed=9 coplot(Gas ~ Temp | Insul,whiteside) It shows the relation of gas on temperature conditional to Insul #Random numbers generation x1 <- runif(50) x1 #We generate 50 obs. from a uniform r.v. x3 <- rnorm(50,sd=45) x3 We generate 50 obs. from a normal r.v, while we specify the value for the SD x4 <- rchisq(50,df=1) x4 We generate 50 obs. from a chi-squared r.v. with 1 df x5 <- rt(50,df=3) x5 #We generate 50 obs. from a t- r.v. with 3 df #Create the dataframe z <- data.frame(y, x1, x2, x3, x4, x5) with the data.frame command, we create a data frame lm( y ~ . , data = z) We run a linear regression model specifying all the coefficients lm(y ~ x1 + x3 + x5, data=z) regression multiple Statistical tables and quantiles dnorm(0.4) # density function: density of N(0,1) calculated in .4 qnorm(0.96) # quantile function: 96th percentile of a N(0,1) Normal distribution dnorm(x,mean=0, sd=1) pnorm(x,mean=0, sd=1) qnorm(x,mean=0, sd=1) rnorm(x,mean=0, sd=1) #Descriptive statistics and graphs #Qualitative variables attach(data) table(X) #absolute frequencies table(X)/length(X) #relative frequencies table(X)/length(X)*100 #percentage frequencies FORMULE cumsum(table(X)) #cumulative frequencie #Continuous variables table(cut(W, breaks=c(40,50,58,70,95))) # data organized in classes with suitable breakpoints classes <- c(40,50,58,70,95) hist(W, br=classes, plot = FALSE) #it provides additional information, included the density of frequency table(X,Y) #double distribution table(X, cut(W,br=classes)) table(X,Y,Z) #three-way contingency table ftable(X,Y,Z) #creates a flat contingency table margin.table(tab,1) #marginal distribution of X prop.table(tab) #joint relative frequency distribution prop.table(tab,1) #conditional distribution of Y|X (relative) #Kernel density estimation plot(density(W)) #Kernel density estimation (optimal bandwidth) plot(density(W,bw=3)) #Kernel density estimation (fixed bandwidth) density(W) #Descriptive statistics summary(X) #Frequency distribution for a qualitative variable summary(Z) #Quartiles,Range,Mean summary(table(X,Y)) #Chi-square test of independence median(W) #Median quantile(W, c(0.1,0.3,0.7,0.93,.98)) #Quantiles var(W) #Sample variance of W cor(Z,W,method='pearson') #Correlation coefficient, method=Pearson cor(Z,W,method='spearman') #Correlation coefficient, method=Spearman Operazioni sqrt(16)< radice quadrata log(100,base=10)< logaritmo base 10 exp(2)< esponenziale cos(0)< coseno sin(pi/2) tan(pi/4) abs(-3) < valore assoluto factorial(5)< fattoriale choose(10,4)< scegli tra 10 a 4 un n curve(5-3*x^2+6*x, add=TRUE,lty=2)< crea una curva par(mfrow=c(2,2)) < parabola con max tabella 2x2 curve(sqrt,0,100) title("Square root")< curva con radice curve(cos, -2*pi, 2*pi,add=TRUE, lty=2)< curva con pi greco In a ballot there are 7 white balls and 3 black balls. We draw “in block” (i.e., without replacement) 4 balls. The experiment consists of evaluating how many white balls have been drawn. Use a for loop to repeat this experiment 1,000 times. Evaluate your results. 1. ballot <- c(rep("White",7),rep("Black",3)) 2. nrep <- 1000