Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics and Probability: Concepts and Formulas, Exams of Statistics

Various statistical concepts and formulas, including measures of central tendency (mean, mode, median), spread of distribution, comparing distributions, correlation, regression, probability theory, and sampling distributions. It also discusses the difference between discrete and continuous random variables and the expected value and variance of random variables.

Typology: Exams

2023/2024

Available from 04/10/2024

johnNice
johnNice 🇺🇸

792 documents

1 / 25

Toggle sidebar

Related documents


Partial preview of the text

Download Statistics and Probability: Concepts and Formulas and more Exams Statistics in PDF only on Docsity! MATH-11 STATISTICS MEGA STUDY a guide of answered solutions to pass exam Table of Contents: -; Midterm 1 Ch 3: Welcome Ch 4: Comparing distributions Ch 6: Scatterplots, Association, Correlation (2 variables) Ch 7: Linear Regression Ch 8 and 9: More things about Regression Ch 13: Probability Ch 14: More Probability Theory Ch 15: Random Variables _________________________________ Midterm 2 Ch 16: Modeling Handout: Continuous Random Variables Ch 5: Z-scores, the normal model, the standard normal model Ch 17: Sampling Distributions Ch 18: Confidence Intervals for Proportions Ch 19: Testing Hypotheses About Proportions __________________________________ Final Ch 20: Inferences for Means Ch 21: Types of Errors and 21 Questions Ch 22: Two-Sample Proportion Inference Ch 22 and 23: Paired Data, Two Sample Means Ch 25 P1: Inference About the Regression Coefficients Ch 25 P2: Prediction Intervals/Confidence Intervals Ch 24: Chi-Squared Tests Ch 3 - Welcome Categorical/Qualitative Data - ANS; Data that falls into categories or labels; often text ideas; tend not to have units Numeric/Quantitative Data - ANS; Numerical data with units Uniform model histogram - ANS; no peaks unimodel histogram - ANS; one peak bimodel histogram - ANS; two peaks multimodel histogram - ANS; 3+ peaks histogram symmetry - ANS; When the left and right halves of a histogram look similar/the same Tails of distribution - ANS; left and right sides of a graph Skewed left - ANS; longer tail on the left Skewed right - ANS; longer tail on the right Outliers - ANS; data that stands apart from the distribution median - ANS; data value in the middle of the list of data mean - ANS; the center of a distribution must take into account the data values themselves, not just the order they're in. It is the calculated average value (sum of all terms / amount of terms) mode - ANS; value that occurs the most often in a set of data Median on a histogram - ANS; same amount of area on both sides Mean on a histogram - ANS; balance point of the histogram, the torque is the same on both sides of the mean the median is resistant to - ANS; outliers and skew center used for asymmetric distributions (skewed) - ANS; median center used for symmetric distributions without outliers - ANS; mean Spread of distribution - ANS; where does most of the data lie? Three ideas for measuring spread - ANS; range, interquartile range, the five number summary range - ANS; (max value)-(min value) Pros of range - ANS; easy to calculate, gives sense of span of data sum of (residuals)^2 the line of best fit is determined by - ANS; slope and y-intercept Regression line equation - ANS; ŷ = b0 + b1 * x b0 - ANS; ŷ-b1(x) this is the x-intercept ( value of ŷ when x=0 ) b1 - ANS; r(stdev y / stdev x) this is the slope ( value that y increases by for every unit that x is increased by ) Regression to the mean - ANS; people far from the mean are pulled towards it in subsequent trials because it is easier to score near the mean than far from it. Conditions for creating a regression model - ANS; since correlations are involved, we need our three conditions from before: 1) quantitative variable 2) straight enough 3) no outliers 4) residual noise Using noise to determine whether a regression model is appropriate - ANS; the residual plot should show "noise", or no observable patterns in the plot. -if a pattern is seen, regression is not appropriate ch 8 and 9 - ANS; More things about Regression Percent variance explains (R^2 or r^2) - ANS; For a given linear model, r^2 (the correlation coefficient squared) is the proportion of the variation in the y-variable that is accounted for (or explained) by the variation in the x-variable Incorrect uses of linear regression - ANS; 1. fail to look at the residuals and make sure the model is reasonable 2. don't extrapolate with caution 3. don't consider outliers carefully enough 4. build a model of data that isn't straight enough If the residuals show any type of pattern - ANS; your current linear model is not appropriate subgroups can be identified in original data or residuals. - ANS; split your data into different parts and doing several linear regressions instead of one, clunky regression. Subgroups may not be visible unless - ANS; you think about them A high r^2 value is - ANS; not an indicator that a linear model is appropriate don't assume your data are all part of - ANS; one homogeneous population. think about possible subgroups to make analysis better. Interpolation - ANS; using your model to predict a new y value for an x value that is within the span of x data in your model Extrapolation - ANS; using your model to predict a new y value for an x value that is outside the span of x data in your model Which is more accurate: interpolation or extrapolation? - ANS; Interpolation is more accurate because the pattern you built applies to the data within range Extrapolation is dangerous because - ANS; it assumes the relationship holds beyond the data range you have seen and used for a model high leverage point - ANS; outlier where x is far from the mean of x values high influence point - ANS; gives a significantly different slope for the regression line if it is included, versus excluded, from an analysis Do not create a regression when what type of outlier is present? - ANS; a high influence outlier is present Good way of inflate r^2 - ANS; dividing data into subgroups that are more homogenous neutral way of inflating r^2 - ANS; Tossing outliers and doing the analysis without them (good or bad depending on the situation) -if outliers are trolls, tossing them is fine -if outliers are valid, observed data, you cannot toss them bad way of inflating r^2 - ANS; using summarized data rather than unsummarized data when you average things, you are eliminating - ANS; most variation that happens Tower of power - ANS; When original data or the residuals convince you that the data are not straight enough, apply a mathematical function to the values Quadrant 1 or 2 curve - ANS; apply a function higher on the tower of power than is currently used Quadrant 3 or 4 curve - ANS; apply a function lower on the tower of power than is currently used Ch 13 - ANS; Probability Empirical probability - ANS; You determine how likely something is by trying it over and over and looking at tons of data. eg. determining if a coin is fair by flipping it 100,000 times and recording number of heads and tails Theoretical probability - ANS; You build a mathematical model to describe a situation and use the axioms of probability to determine the likelihood of some events eg. Determine chance of rolling even numbers on a die is 1/2 because 3/6 of possible otucomes are even numbers Subjective probability - ANS; Consider a number of factors important to the situation, personally decide how important they are, and use these to come up with an answer. eg. I have a 60% chance of getting an A because i do all readings, HW, come to classes, and am an A/B student in other math classes. Does empirical probability make sense? - ANS; Yes, you tend to get what you expect Law of Large Numbers (LLN) - ANS; as a random process is repeated more and more, the proportion of times an event occurs converges to a number (the probability of that event) The Law of Averages - ANS; (Gambler's fallacy) Incorrect use of LLN. False way of thinking that says if the current situation is out of whack, then it must correct itself in the short term. Trial - ANS; an action that creates data outcome - ANS; the data created from a trial event - ANS; some set of outcomes you might care about sample space - ANS; the set of ALL possible outcomes If all the outcomes in a sample space are equally likely, we define the probability of an event A to be - ANS; P(A) = (# of outcomes in event A)/(# fo outcomes in the sample space) where 0 <= P(A) <= 1 If an event can never occur, then P(A) - ANS; =0 If an event must occur, then P(A) - ANS; =1 Complement rule - ANS; P(A) = 1 - P(A^c) Parameter, μ (or E(x) - ANS; μ (or E(x) A value that helps summarize a probability model Spread - ANS; Another parameter we might care about. Big spread is exciting for people in Vegas because they focus on the bigger wins Variance (σ^2) - ANS; Var(X) = Sum of all x: (x-μ)^2 * P(x) The units on variance will always be - ANS; the square of the units in the problem. This can make variance difficult to interpret Standard Deviation (σ) - ANS; SD(X) = sqrt(Var(X)) In general, the side of the SD gives a sense for how - ANS; closely you experience playing the game will hug the mean Adding constants to random variables - ANS; E(X±c) = E(X) ± c Var(x±c) = Var(x) SD(x±c) = SD(x) Scaling random variables - ANS; E(aX) = aE(X) Var(aX) = a^2 * Var(X) SD(aX) = |a| * SD(X) Adding random variables (no constants) - ANS; E(X±Y) = E(X) ± E(Y) if X and Y are independent variables: Var(X±Y) = Var(X)+Var(Y) (always +, never -!) SD(X±Y) = sqrt(Var(X)+Var(Y)) (always +, never -!) Two random variables are independent fi - ANS; knowing the outcome of one has no effect on the outcome of the other E(X±Y) = E(X) ± E(Y) is true even if - ANS; X and Y are dependent Why doesn't X+X = 2X? - ANS; -In the X+X scenario, we often add winning and losing situations which diminish the influence of one another. (e.g win + win, loss + loss, win + loss, loss + win are all possible) -In the 2X scenario, you either win twice or you lose twice. Midterm 2 material - ANS; Midterm 2 material Ch 16 - ANS; Modeling Bernoulli trial - ANS; Random variable with precisely 2 independent outcomes. P(x) = {p (x=success) or [1-p = q] (x=failure) Common Geometric Model questions - ANS; -What is the probability that it takes exactly k <Bernoulli trials> to get the first <success>? -On average, how many <Bernoulli Trials> will it take to get the first <success>? Geometric model - ANS; X = Geom(p), where p is the probability of success and X is the number of trials needed to get a success. -Assume we are doing a Bernoulli trial with success probability p (and failure probability q=1-p) over and over until we get a success. The probability of getting a success in x trials is: P(x)=[q^(x-1)]*p E(X) = 1/p SD(X) = sqrt(q/p^2) = [sqrt(q)]/(p) The expected value of the geometric model answers - ANS; How many trials are needed to get the first success, on average Common Binomial Model questions - ANS; -What's the probability of getting exactly k<successes> in n<Bernoulli trials>? -On average, how many <successes> will i get if I do n<Bernoulli trials>? Binomial Model - ANS; X=Binom(n,p), where n is the amount of trials, p is the probability of success, and X is the number of successes in n trials. -Probability of getting k successes in n Bernoulli trials is: P(k) = (n nCr k) * (q^(n-k)) * (p^(k)) E(X) = np SD(X) = sqrt(npq) The choose symbol (X nCr Y) - ANS; X nCr Y: Helps you calculate how many ways there are to list X successes among Y attempts. Formula: (n!)/[k! * (n-k0!)] Common question for the Poisson distribution - ANS; In general, <some behavior> is average. How likely am I to see <some specific behavior>? e.g. You have 12.5 emails per day and X% are spam. How likely are you to see 5 spam emails in a day? e.g. There is an average of 2.5 goals scored in each soccer game. How likely is it for a game to have 9 goals? The Poisson Distribution - ANS; P(x) = (λ^x)(e^-λ) / (x!) λ = average value x = value whose probability you are trying to predict E(x) = λ SD(x) = sqrt(λ) The Poisson model is a good approximation of the Binomial model when - ANS; n >/= 20 and P <0.05 or n >/= 100 and p < 0.1 This is helpful because the Binomial model becomes unusable when n gets really big or small If you have a situation modelled by Binom(n,p) in which n is large and p is small, then use a Poisson model instead where - ANS; λ = np where: [n >/= 20 and P </=0.05 or n >/= 100 and p </= 010] and [np </=20] Handout Lectures - ANS; Continuous Random Variables Continuous random variable - ANS; random quantity that can take on any value on a continuous scale ("a smooth interval of possibilities") e.g. The amount of water you drink in a day, how long you wait for a bus, how far you live from the nearest grocery store. Visualize the probability table on a graph - ANS; An outcome is more likely if there is more area in the bar for that value on the graph We also know that the sum of the areas of the bars must be 1 Heights must be at least 0 (no negative bars) Density function - ANS; Only area under the graph is linked to probability. For a continuous random variable X which takes on any real number, we need model it through a density function f(x) which has 2 properties: - ANS; 1) f(x) >/= 0 for all x 2) The integral from -∞ to ∞ of f(x) equals 1 If f(x) is a density function for the continuous random variable x, then P(a < x < b) equals - ANS; The integral from a to b of f(x) The probability of any particular outcome happening is - ANS; 0 This is because the integral from a to a of f(x) = 0 The density graph is NOT P(X) - ANS; It is a function that helps you figure out probabilities by examining the area under it. Its shape suggests what values are more likely (relatively) but the probability of any particular otucome occuring is still 0 -About 95% of the data values are within 2 SDs of the mean -About 99.7% of the data values are within 3 SDs of the mean For any x-value (or z-score, if you convert to a standard normal model, N(0,1)) the percentile is - ANS; simply the area to the left of this value Ch 11 - ANS; Populations and Samples Population - ANS; Everything you want to study e.g. A huge pot of soup Parameter - ANS; Some value summarizing the population Sample - ANS; A (hopefully) representative subset of your population e.g. A spoonful of soup from the top of the pot Statistic - ANS; Some value summarizing the sample bias - ANS; sample is not representative of the population in some way -good sampling is about reducing as much bias as possible One of the best ways to avoid bias is by introducing - ANS; random elements into the sampling process e.g. Stir the pot before tasting the soup The sample size does not need to be - ANS; some percentage of the population size. Larger samples are better irrespective of the population size e.g. Tasting a small pot of soup gives same amount of info as tasting a large pot of soup -However, tasting 3 spoons of soup is better than tasting 1 spoon Sampling Frame - ANS; Universe you will be picking from Simple Random Sample (SRS) - ANS; Imagine each point in a box as a person. We just pick a certain number of random points. Stratified Random Sampling - ANS; -What is the average GPA of UCSD students? -Since grads and undergrads have much different average GPAs, you split the sample into 2 groups, do SRSs on each, then combine the results. -Pieces are homogeneous in relation to parameter you are measuring (undergrads have lower GPAs, grads have higher GPAs) Cluster Sampling - ANS; -Sampling in which elements are selected in two or more stages, with the first stage being the random selection of naturally occurring clusters and the last stage being the random selection of elements within clusters -e.g. asking people as they walk into various gyms on campus what their average GPAs are. 3 different gyms can have both grads and undergrads. -Pieces just because it's more convenient -Pieces heterogeneous in relation to parameter you're measuring (Gyms all have same undergrads and grads) Systematic Sampling - ANS; Sample elements are selected from a list or from sequential files e.g. Asking every 10th person you see Multistage Sampling - ANS; You focus on undergrads today and ask every 4th one you see. You do grads the next day and ask every 4th one you see. -Uses 2 or more of the previous methods (excluding SRS) Volunteer bias - ANS; Those who are willing to take their own time to voluntarily complete something like a survey usually look different from those who don't. It does not represent those are opt out of volunteering. Bad Sample Frame Bias - ANS; Sample is not representative of population. e.g. Want to determine if people in US like facebook. Study facebook users in US. You completely underrepresent people who don't use facebook. Maybe they don't use facebook because they hate it! Convenience Sample Bias - ANS; A form of bad sample frame. Easiest sample to take is not representative of population. e.g. You work at facebook and survey 5000 on whether they love FB. This is convenience sample bias because you are likely friends with your coworkers, who also work at facebook and are more likely to either love it or hate it (depending on how working there affects them). Ch 17 - ANS; Sampling Distributions Know the symbols for both Statistics and Parameter: - ANS; Proportion, mean, SD, correlation, regression coefficient Proportions and means - ANS; When populations are big, we must draw a random sample and estimate these parameters using statistics Because of randomness, there is - ANS; variation in this statistic Sampling distribution - ANS; making a histogram of all the means from all our different samples The center of the sample distribution is at - ANS; mean, μ The standard deviation of the sampling distribution is: - ANS; σ = sqrt(pq/n) (square root of [probability of success]*[probability of failure] divided by the [number of samples]) The spread of the sampling distribution is: - ANS; σ/sqrt(n) (Standard deviation over the square root of the number of samples) The sampling distribution is a normal curve with model - ANS; N( μ, σ/sqrt(n) ) As long as the conditions are met, it does not matter what distribution you start with. If you keep taking samples. you'll eventually get a - ANS; normal distribution Do we always get a normal model for the sampling distribution of a mean - ANS; We do if 2 conditions are met: 1. Independence Assumption: The items in each sample must be independent of one another. Typically, better to check two conditions (which effectively mean independence) 1.A. Randomness Condition: The items in your sample must be randomly chosen 1.B. <10% Condition: Your sample size needs to be <10% of the population size. 2. Nearly Normal Condition (Sample size condition): The population histogram should look nearly normal. If this histogram shows skew, the sample size needs to be large for the sampling distribution to be normal. e.g. n>30 for moderate skew, n>60 for large skew. - ANS; As the size of a sample grows, the sampling distributions tends to look - ANS; more and more normal To get a normal sampling distribution from samples of a population: The greater the skew in the population, - ANS; the higher n must be to get a normal sampling distribution The Central Limit Theorem (CLT) - ANS; Proves the sampling distribution for a proportion statistic or mean statistic will be a normal distribution, regardless of the population distribution (assuming we have met the 2 conditions: Independence and Nearly Normal) Summary of Sampling Statistics: To estimate a population parameter p, we can - ANS; -Draw a random sample of size n. -This sample will have a statistic p̂ ≈ p. -If we drew many samples, each would have its own statistic p̂ and we could make a histogram of these values -The histogram, the sampling distribution, is approximately: N( μ, σ/sqrt(n) ) Ch 18 - ANS; Confidence Intervals for Proportions 3. Draw a sample and consider it assuming the null hypothesis H0 is true. Find the mean and SD of this data and make a plot. (you use p and q instead of hats because if you are assuming that H0 is true, then you are assuming you know the values for p and q) -Calculate the p-value: the probability/chance of seeing our result or something more extreme if our universe is "H0: The drug works as well as the placebo" 4. If p-value </= 0.05, reject null hypothesis If p-value > 0.05, fail to reject null hypothesis p-value - ANS; The probability under the curve of the test statistic z (recall that z = [(μ) - (μ0) / SE] For one-tailed test: p-value = P(z>z0) If HA is on the right tail or p-value = P(z<z0) if HA is on the left tail For two-tailed test: p=value = 1-(2*P(z<z0)) (<-most common) = 2*P(z>z0) = 2*P(z<-z0) Understanding p-value - ANS; If the p-value is below 0.05, there is less than a 5% chance for that probability to be observed given that the center of the distribution is the mean given by the conditions of the null hypothesis. p-value main points - ANS; 1. p-values can indicate how incompatible the data are with a specified statistical model 2. P-values do not measure the probability that the studies hypothesis is true 3. A P-value (statistical significance) does not measure the size of an effect or the importance of a result (practical significance) 4. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold By assuming H0, build a universe where p is in accordance with H0. - ANS; You must first make sure that the sampling distribution is approximately normal: Make sure <10% of total population Make sure np >= 10 success and nq >= 10 fails P overload - ANS; 1. p: is the proportion of some trait in a population. It is a parameter 2. p̂: is the proportion of some trait ina sample. It is a statistic. 3. P(A) is the probability of some event A happening 4. P-value a conditional probability: it is the probability of getting the value p̂ (or something more extreme) in a universe where p is true. Ch 20 - ANS; Inferences for Means For smaller sample sizes (n<30) or populations where you don't know σ (and must approximate using sx), there is a better approximation of the sampling distribution than the normal model - ANS; It is called the t-distribution When to use Z vs. T - ANS; -If you know sigma (almost never true): use z-distribution -In all other cases: Use t-distribution T-distribution gives more - ANS; precise results There are many curves in the t-distribution family - ANS; With n data points in sample, you use t-distribution with df (degrees of freedom) = n-1 As df becomes larger, the t-distribution becomes - ANS; more standard/normal. The center does not change. The spread becomes narrower. Confidence Interval formula for t-distribution - ANS; x̄ +/- t*n-1 *(SE(x)) SE(x) = sigma/sqrt(n) = Sx/sqrt(n) Assumptions made for statistical inference when using the t-distribution - ANS; Independence of data: Randomization condition, <10% condition. Population distribution must be nearly normal: -look for near-normality in histogram of your sample -More skew is OK as n gets larger Ch 21 - ANS; Types of Errors Type I Error - ANS; Rejecting the null hypothesis when it is actually true Type II Error - ANS; Failing to reject the null hypothesis when it is false. β Power of a test - ANS; The power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. P(reject H0|H0 is false) = 1 - β or P=P(fail|HA is true) = 1-P(success|HA is true) P(making a Type I error) = - ANS; P(reject H0|H0 is true) = alpha In any given situation, the higher the risk of Type I error, the lower the risk of Type II error. - ANS; the lower the risk of Type II error. What are the effects on error of increasing alpha (α) - ANS; The risk of a Type I error is decreased and the risk of a Type II error is increased. How do you increase the power of a test? - ANS; Lower the cutoff value (α) Ch 22 - ANS; Two-Sample Proportion Inference For a two sample proportion test testing p1 and p2, you would think about - ANS; p1 - p2. e.g. p1 - p2 > 0 two ways to use t-distribution: - ANS; 1. Estimate p1-p2 using a confidence interval about p̂1 - p̂2 2. Run a hypothesis test with H0: p1-p2 = 0 if x and y are 2 independent random variables with normal distributions, then - ANS; x- y is also normal. also, since x and y are independent, var(x-y) = var(x) + var(y) and thus SD(x-y) =sqrt(var(x-y)) = sqrt(var(x)+var(y)) = sqrt(SD(x)^2 + SD(y)^2) so if p̂1~N(p1, sqrt((p1q1/n1)) and p̂2~N(p2,sqrt((p2q2/n2)), then - ANS; p̂1-p̂2~N(p1- p2, sqrt((p1q1/n1)+(p2q2/n2)) Confidence interval for 2 sample proportions - ANS; (p̂1-p̂2)+/-z*SE(p̂1-p̂2) -samples must be independent from each other, at least 10 success/fails condition must be met also p̂ pooled - ANS; successes1+success2 / n1 + n2 Used when you are doing a hypothesis test. If we assume H0 is true (p1-p2 = 0) then the populations are the same and pooling p̂1 and p̂2 will give better approximations than using both of them separately. Finding Z using pooling - ANS; Z = (p̂1-p̂2)-0 / SEpooled SEpooled = sqrt(p̂pooled - q̂pooled)/n1 + (p̂pooled - q̂pooled)/n2) Ch 22 and 23 - ANS; Paired Data, Two Sample Means All confidence intervals work the same way, with slight changes - ANS; a CI for the mean of one sample - ANS; x̄ +/- t*df*(SE(x̄)) SE = s/sqrt(n) df = n-1 CI for mean difference of paired samples - ANS; dhat +/- t*df*(SE(dhat) -d stands for differences SE = s/sqrt(n) df = n-1 CI for mean difference in two samples - ANS; (x̄1-x̄2) +/- t*df*(SE(x̄1-x̄2)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved