Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli

STATISTICS (DIS) - Appunti del terzo parziale / midterm 3, Appunti di Statistica

ITA: Riassunto in lingua inglese del programma del terzo parziale del corso di Statistics del CdL in Diplomatic and International Sciences dell'Università di Bologna integrati con grafici dalle slide/libro ed esempi tratti dal libro (Making sense of data through statistics: an introduction di D. Nevo) ENG: Summary of the study content of the third midterm of the Statistics course (BA in Diplomatic and International Sciences). This document is a combination of notes taken in class, additional examples from the book (Making Sense of Data through Statistics: an introduction), and useful graphs. Topics: - Inferential Statistics - hypothesis testing - Sampling distributions - Point estimation and bias and mean squared error - Confidence intervals for the mean of a Gaussian population - t distribution - Approximate confidence interval for a probability - Hypothesis testing on the mean of a Gaussian population - the p-value - Inference on a proportion.

Tipologia: Appunti

2022/2023

In vendita dal 20/05/2023

ariannaw02
ariannaw02 🇮🇹

4.3

(8)

35 documenti

1 / 16

Toggle sidebar

Spesso scaricati insieme


Documenti correlati


Anteprima parziale del testo

Scarica STATISTICS (DIS) - Appunti del terzo parziale / midterm 3 e più Appunti in PDF di Statistica solo su Docsity! STATISTICS midterm 3 INFERENTIAL STATISTICS - hypothesis testing ○ We often use statistics to test theories → theory: a prediction (or a group of predictions) about how people, physical entities, and built devices behave ○ Theories begin as predictions, which are then repeatedly tested in various settings to either strengthen or refute them. Such testing often involves statistical inference, defined as the drawing of conclusions about a population of interest based on findings from samples obtained from that population - e.g. test the link between expectations confirmation and customer satisfaction by using a sample of 50 shoppers who have purchased something. If I find sufficient statistical evidence supporting the existence of this link, I can infer that the link exists for the population of shoppers of this product ○ With statistical inference, we aim to test specific expectations that we have about the population’s parameters using sample statistics. These expectations are called hypotheses → hypothesis: a specific claim we wish to test ○ We study the probability of our sample’s outcome given the hypothesised distribution of the population We differentiate between the research hypothesis and the null hypothesis Example: A bank manager argues that, on average, people carry $50 or more in their wallets. This claim is the null hypothesis. The research hypothesis contains the other side of this claim - that is, that people carry less than $50. We can also write it as: ➢ H₀: average amount of money ≥ $50 → null hypothesis: the existing status quo ➢ H₁: average amount of money < $50 → research/alternative hypothesis: the claim you believe true Example: A professor wants to know if her section’s grade average is different than that of other sections, which is 75. To learn if her section’s grade average is different than 75, we set up the following hypotheses: ➢ H₀: section’s grade average = 75 ➢ H₁: section’s grade average ≠ 75 3 different sets of hypotheses: 1. Lower-tail test: tests whether the average is lower than a specific value 2. Upper-tail test: tests whether the average is higher than a specific value 3. Two-tail test: tests whether the average is different than a specific value (either higher or lower) + The ‘equal’ sign (=, ≥, ≤) always goes in the null hypothesis, the one you’re looking to dispute Hypothesis must always be exhaustive and mutually exclusive ○ Exhaustive means that the hypotheses should cover every possible option (e.g. H₀ ≥ $50 and H₁ < $50, and not H₀ = $50 and H₁ < $50) ○ Mutually exclusive means there is no overlap between hypotheses HYPOTHESIS TESTING ○ It involves the validity of a statistical statement, i.e. the validity of the null hypothesis ○ If the null hypothesis is: ➢ Statistically valid: we do not reject it ➢ Not statistically valid: we reject it Textbook Shop Example: We are interested in determining the amount of money that customers are likely to spend when shopping at a bookshop. Based on historical data provided by the current owner, the average customer making purchasing at the store spent $100, with a standard deviation of $35 We are concerned that sales may have decreased over the last year. So we hypothesise that: ➢ H₀: Average spending (µ) ≥ $100 ➢ H₁: Average spending (µ) < 100 → µ is the hypothesised average spending (the parameter of interest) for the population (all customers who have made purchases at the store) Data from 20 random customers shows that the mean spending for our sample is $93.27 → our sample mean is indeed lower than the hypothesised mean, but we can attribute it to sampling error. Should we reject the null hypothesis based on this one sample? We should ask ourselves this: how possible is it for us to take a random sample of size n=20 out of a population with a mean of $100 and a standard deviation of $35 and obtain a sample mean of $93.27? ➢ If this outcome is reasonably possible: no reason to reject the null hypothesis or conclude that the average is lower than $100; we can attribute the difference to the sampling error ➢ If the chances of finding this sample mean are extremely low, we should instead conclude that the null hypothesis was likely false, and we should reject it - as we have confirmed our belief that the true average spending is likely lower than $100 THE SAMPLING DISTRIBUTION OF THE MEAN Sampling distribution refers to the probability distribution of any sample statistic, so to the range of values a variable may obtain, and their corresponding probabilities The sampling distribution of the mean describes the probabilities attached to all values of the mean of samples that are repeatedly taken from the same population Bookshop example: we should repeat the above sampling procedure 200 times, each time taking a different sample of 20 customers and computing the mean spending for each sample At the bottom we compute the mean of means; that is, we averaged the means of the 200 samples, as well as computed the standard deviation of these 200 sample means Here the mean is $99.04 CENTRAL LIMIT THEOREM: states that the sampling distribution of the mean of a random sample of any size (n) drawn from a normally distributed population also follows a normal distribution with a mean of µ and a standard deviation of σ/√n. This standard deviation (σ/√n) is called the standard error of the mean Bookshop Example: X~N(100,35). Applying the central limit theorem, we know that the mean (X) of a sample of size n taken from this population of customers is also normally distributed with µ=100 and a standard deviation of √35/n In our example with n = 20, Ẍ(con trattino)~N(100,35/√20) or Ẍ~N(100,7.83) The distribution illustrated in the figure that referred to the repetition of the sampling procedure for 1000 times has a mean of $99.83 and a standard deviation of $7.73. These numbers are close to the mean of 100 and standard deviation of 7.83 obtained by using the central limit theorem As we both increase further the number of samples we take and reduce the class size of the frequency distribution, the shape of the sampling distribution will move gradually closer to being a continuous line, and will eventually converge with that defined by the central limit theorem. What if our p-value was not as low, and was instead 0.05, 0.1 or even 0.5? Where should I draw the line? The answer is determined by the person conducting the test, based on a desired significance level (α) SIGNIFICANCE LEVEL: ○ A finding is statistically significant when the finding is unlikely to have occurred by chance. Our level of significance is the maximum chance probability we are willing to tolerate. ○ Hence, we form the following decision rule for our hypothesis testing context: for any p-value smaller than α, we reject the null hypothesis and conclude it is false Customer Calls Example → lower-tail test (the alternative hypothesis was about the mean being lower than some hypothesised value) Given the probability obtained, we can draw one of two conclusions: 1) The hypothesised mean is true, and we were just very lucky to have found such an unlikely sample, or 2) The hypothesised mean is not the true mean, which is likely lower than 4.4 minutes and closer to the value of 4.0 that we observed with our sample Our significance level is the probability at which we would switch from conclusion 1 to conclusion 2 In the social sciences, the most commonly used threshold for α is 5% (P of 0.05), with marginal significance often considered to be 10% (P of 0.1). Any probability of occurrence greater than 10% is generally considered non-significant. Since in our example the p-value of 0.0008 is far smaller than any α value likely to be selected, we reject the null hypothesis. Baseball Example → upper-tail test A baseball player claims that the average speed of his fastball pitch is greater than that of his rival, who averages 90mph. He collects data on the speed of 100 fastballs and finds his average speed in 90.8mph. Assuming that we know the standard deviation of a fastball pitch to be 3.85mph and that they are normally distributed. What can we conclude about his claim? H₀: µ ≤ 90; H₁: µ > 90 (the one he would like to prove) Let X be a variable representing pitching speed (in mph). Assuming the null hypothesis to be true and based on the central limit theorem, we know that the sampling distribution of Ẍ (the average pitching speed) is: Ẍ~𝑁(90, 3.85/√100) To obtain the test statistic, we convert our sample mean of 90.8 to a Z score using the hypothesised distribution: Z = (90.8-90)/(3.85/√100) = 2.078 Next we determine the p-value for this test statistics. Since it's an upper-tail test, we are looking for the P of finding a value at least as high as our test statistics → P(Z ≥ 2.078) = 0.0188 (shown in the red shaded area) Now, we apply our decision rule. Here, α = 1% → we reject the null hypothesis if the derives p-value is smaller than 1% Since the derived p-value 1.88% is greater than α, we do not reject the null hypothesis. Therefore, we conclude the player’s fastball pitches are not faster than his rival’s. Had we used α = 5%, then p-value < α, so we reject the null hypothesis and conclude he is faster. → because of this type of situations, we often use the p-value to infer the maximum significance of the test. With this baseball player example, our test is significant at the 5% level (we were able to reject the null hypothesis), but it is not significant at the 1% level (we cannot reject the null hypothesis). The highest level of significance for which we would still reject the null hypothesis is 1.88% Statistics Grades Example → two-tail test The average grade for all other course sections is 75 with a standard deviation of 4. The professor collected test scores from 25 of her students and found a sample average of 72. She wishes to conduct a hypothesis test at the 5% level of significance (α = 0.05). Assume that grades follow a normal distribution Hypotheses: H₀: µ = 75; H₁: µ ≠ 75 → this is a two-tail test in which we would reject the null hypothesis if the sample mean is significantly smaller or significantly larger than the hypothesised mean. Ẍ~𝑁(75, 4/25) → test statistic: Z = (72-75)/(4/√25)= -3.75 → p-value P(Z<-3.75)= 0.00009 Being a two-tail test, the p-value needs to be doubled before we compare it to α → p-value = 2*P(Z≤ |test statistic|) = 2*0.00009 = 0.00018 Since 0.00018 < α (0.05), at α = 0.05, we can reject the null hypothesis and conclude that the average grade is different than 75 FIVE STEPS IN HYPOTHESIS TESTING: 1. Formulate the hypothesis, for an upper-, lower-, or two-tail test 2. Compute the e test statistic and p-value (from the Z distribution) 3. Formulate the decision rule → reject the null hypothesis if the p-value is < α 4. Apply the decision rule → compare the derived p-value to α 5. Draw and interpret your conclusion: decide whether or not to reject the null hypothesis and then answer the original research question TYPE I AND TYPE II ERRORS When testing hypotheses, we can make two kinds of errors: I. Reject a true null hypothesis (a type I error) II. Not reject a false null hypothesis (a type II) Average Spam Messages Example: Claim: average phone customer receives no more than 8 spam messages a day, with sd of 0.8 You collect data from a sample of 60 and get a sample average of 8.2 → hypotheses: H₀: µ ≤ 8; H₁: µ > 8 → test statistic: Z = (8.2-8)/(0.8/√60) = 1.94 → P (𝑍 ≥ 1.94) = 0.026 Since 0.026<0.05, the null hypothesis should be rejected → average customer receives more than 8 spam messages a day However, you assumed the null hypothesis to be true and computed the probability of finding the sample mean of 8.2. Assuming the null hypothesis is true, the probability of finding a sample of n=60 with a mean of 8.2 is quite low: 𝑃(𝑍 ≥ 1.94) = 0.026 Given this probability, you can draw one of two conclusions: either (1) the distribution is as was hypothesised and you were extremely lucky to have found such a sample, or (2) the true distribution is not the one described by the null hypothesis. If the null hypothesis is, in fact, true and you drew the second conclusion above, then you would commit a type I error. Because our rejection rule is based on α, α is also the probability of committing a type I error → if we wish to reduce the chances of a type I error, we need to use smaller values of α (e.g. using α=0.01 would not have led you to reject the null hypothesis, thus avoiding committing a type I error!) Why not use the lowest α possible? Because we also have to consider the probability of committing a type II error (not rejecting a false null hypothesis). This probability is called β, and it is inversely related to α Therefore, in determining α, we need to consider which type of error is costlier: if a type I is costlier, then we should choose a low value for α to avoid making the error; if a type II is costlier, then we’ll choose a higher value of α to ensure is β low. Diabetes Example: → type I error People who suffer from diabetes are concerned with the Glycaemic Index (GI) of foods, which measures the effect of food on blood glucose. A new energy bar manufacturer claims that its bar has a GI of 101 (which is fairly high). A type I error in this case would mean concluding that the Glycaemic Index is not 101 when, in fact, it is (i.e., rejecting a true null hypothesis) → because of the possible health consequences, we would try to minimise the probability of conducting a Type I error → we should select a low α value (like 0.01)! Airline Example: → type II error An airline is concerned with the average weight of carry-on bags for the purpose of fuel consumption calculations. The airline needs to know whether the average weight of carry-on bags is more than 22lbs (H0: µ≤22). A type II error would occur if the airline does not conclude that, on average, the bags weigh more than 22lbs when, in fact, they do (do not reject a false null hypothesis) → such error would result in underestimating fuel requirements, which is quite a grave error. In this example, the airline would try to minimise the probability of a type II error by increasing α (such as 0.1) t-DISTRIBUTION We’ll no longer use the standard deviation of the population (σ), but instead use s (the sample standard deviation). This is more realistic as we frequently know very little about a population’s σ. Hence, while we hypothesise about the true value of the mean (μ) of some variable of interest, we conduct our test based solely on sample data that we collect, using the mean and standard deviation obtained from our sample. As a result, we’ll no longer use the standard normal distribution. Instead, we use the t-distribution. The main difference is that the shape of the t-distribution depends on the number of degrees of freedom associated with the analysis being performed. Because we are working with sample-estimated values, we lose a degree of freedom - we thus work with a t-distribution with n-1 degrees of freedom As the number of degrees of freedom increases, the shape of the t-distribution gets closer and closer to that of the normal distribution The t-distribution table The top row of the t-table provides specific significance level (α) values, with each α value corresponding to a worksheet column. The notation t0.05 thus indicates that this column can be used to find the probability of a specific value of t (denoted as t*) such that: P(t≥t*) = α and, with α=0.05, this expression becomes: P(t≥t*) = 0.05 For example, given five degrees of freedom and an α of 0.05, the table returns a value of 2.015, meaning that: P(t≥2.015) = 0.05 Because the t-distribution is symmetric, we use negative values when looking at the distribution’s left-tail. This is expressed as: P(t≤-t*) = α, with an example being: P(t≤-2.015) = 0.05 Hypothesis Test of the Population Mean when the Population Standard Deviation is Unknown ○ You can use the p-value approach (using the standard deviation of the sample mean) or… ○ critical value approach: based on whether or not the sample mean lays greater than a threshold distance (defined in terms of number of standard errors) beyond the hypothesised population mean. Such threshold distance is known as the critical value → regardless of which approach you decide to follow, the conclusion reached will be the same College Loan Debt Example Determining whether the mean college loan debt upon graduation for students at your college is lower than the national average of $35,000. Hypotheses: H₀: μ ≥ 35,000 H₁: μ < 35,000 We used α = 0.05 as the specified level of significance in conducting the test. Our sample data revealed a mean debt of $32,743.85 for students at your college. We also computed a test statistic using the sample mean and standard deviation, with the computed value of the test statistic being -2.26. estimate with a 1- 𝛼 (𝛼% = percentage of samples of size n that determine interval estimates that won’t contain the unknown parameter μ) We can also define an interval under the normal distribution curve that captures 1-𝛼 percent of the area under the distribution curve as: where ±𝑍(𝛼/2) are the cut-off values at both tails of the distribution. In our Customer Hold-Time Example, we know ẍ=4.0, σ=0.9, n=50, and 1-α=0.95. Finding z(𝛼/2): a) Look for z(𝛼/2) such that: P(-z(𝛼/2) < Z < z(𝛼/2)) = 0.90 b) Easiest: look for the lower-tail probability in the normal table: P(Z < - z(𝛼/2)) = 0.05 → therefore: z(𝛼/2) = 1.6454 (se conf: 0.9, si usa sto numero) Now we can compute: We thus obtain: ➢ A lower confidence limit (LCL): 4-1.64*(0.9/√50) = 3.79 ➢ An upper confidence limit (UCL): 4+1.64*(0.9/√50) = 4.21 Hence, the 90% confidence interval of the mean hold time of customer service calls is [3.79…4.21]; and, we are able to state, with a confidence level of 90% that this interval captures the true population mean So how likely are we to believe the claim from the consumer group that the true mean call time is 4.4 minutes? At 90% confidence, we are not likely to believe this claim, given that the UCL is less than 4.4. ARPRLATDA samte same sone | Exezore + - Kieorie sio re sha Na no (Gaye... an) TEST MUE GAM SUST TUE. mean STARrIRA GRAVI “fe letali A eosinee» Souca agacuate le moge UAN q2200$ @al |©i HS hed i L & unknown | sr Di n= 500 Cid TE = 0.5 EG)= ©r-97 srenione por sese SR - s/n | bremesoro J ROPLAN E o Ki " eraenna SAMEY E E (m-af LISI NA = ER) 1 lio. 4 42509 È f DA , Wi» 7 42600 $ S enencun mer esime E(S)-0 Ss sAMAE 80099 x 200. Raso, i z ] fas ni Xi o ten) e ne30 x: gros $ 5 £ + 800) EGR /Un ; BAD a cu N64) fEnzzo & EZIO 3 (SA E Ga) I Conai nio - Ept È cen Ea iu .@ _ mretate 2 x ER pa >) + NOA È | SUFPOIE. MUAT 4 SIMRE SQ CO mnsee perno Siypettà UieLD SG = 9/Mm Xu an aumsae IT 0g 40000 E and A ero mne) 0g cod 3 cOSIASO) (200 Este rue mes cum comi de coficene mmm - | forronna mae mbe see Ho: 1 casco naso 030.05 Seo e& Mme | Mr4DO ca so Hi: Jay a2zio5 R= 4800 4 per” RR — n(0d) Fo) E. 450008 8/9 48000 - 4esoo E unenson £. «0h 4.96 &> __ = 338 Dia t Lol: 0. dla YA =/ (i i 7 Ds) coniate 0%5 P(8 7323) - L- P(2c3259) inerRLaì at 4 i SU n 5% ce fo 0.999 = Goo A-0= 0.954 Ne 00 a 3 002 F_INSIEAD & 500 Tren L - 19% x 00 = 88020 X- po I asso” io & X. N09 0 2 + A.9%6 x 9000 = 409% i Da 48000- d@500 _ ogg (rs ba 209,05 /7, [29020 40 80] 40% — 600, Go) 0:05 £(T-0.832) 20.40 >? 0839 Confidence Interval for the Population Mean when Sigma is Unknown There is only one difference in the procedure to follow compared to the one above: rather than looking up a value in the Z distribution, we will look up a value in the t-distribution. Hence, the formula is: where Ẍ is the sample mean (point estimator), s/√n is the standard error of the mean, and t(𝛼 /2) is the number of standard errors we need to add to and subtract from the sample mean Houses selling price Example: You will estimate, at a 99% level of confidence, the current mean selling price of homes in the neighbourhood. We know that Ẍ=$216,790, s=$19,393, n=10, and 1-𝛼=0.99 Secondo il libro: P(-3.25<t<3.25) = 0.99 → Hence, the 99% confidence interval of the mean selling price of homes is [$196,859 … $236,721]; and, you can be 99% confident that this interval captures that true population mean HYPOTHESIS TESTING FOR A POPULATION PROPORTION Regardless of the population parameter for which a hypothesis test is being performed, the basic logic of the testing procedure remains the same (the 5 steps previously outlined) Test of a Single Population Proportion Proportions (p) are computed as the number of successes in n trials. For example, when we say that 60% of the students in a particular class are female, we are first defining a success as a student being female, second adding up the number of females (successes) in the class, then adding up the total number of students (trials), and finally dividing the number of successes by the number of trials to find this proportion In order to determine if the null hypothesis can be rejected, we need to go through each of the steps of hypothesis testing. To conduct this test we need to know the sampling distribution of proportions Sampling Distribution of Proportions It describes the probabilities attached to all possible sample proportions (for samples of size n) that are repeatedly taken from the sample population. Applying the central limit theorem and given that our sample size is sufficiently large (as long as n*p ≥ 5 and n*(1–p) ≥ 5), the sample proportion is approximately normally distributed such that: sample proportion (?̂?) of samples of size n taken from the same population follows a normal distribution with a mean of p (the population proportion) and a standard deviation of √p(1-p)/n, which is called the standard error of the proportion. 1. Hypotheses We begin by assuming the variables to be independent → no relationship between the two H₀: Children’s weight is independent of screen time H₁: Children’s weight is not independent of screen time 2. Test Statistic We use the following formula to compute the test statistic: r: number of rows; c: number of columns; o(ij): observed cell frequency; e(ij): expected cell frequency; and we use (r-1)*(c-1) degrees of freedom Summing these 9 components yields the test statistic: χ² = 7.41 + 1.31 + 2.33 + 0.01 + 0.02 + 0.11 + 5.33 + 1.25 + 1.13 = 18.89 3. Formulate a Decision Rule The test of independence is an upper-tail test; thus, we reject the null hypothesis when the test statistic is sufficiently large (i.e., when the differences between observed and expected values are high in value). Hence, our decision rule is: reject the null hypothesis if the test statistic is greater than the critical value. For this example, the number of degrees of freedom is computed as (since the contingency table has 3 rows and 3 columns): df = (r-1)*(c-1) = (3-1)*(3-1) = 4 Applying the specified significance level of 1% and looking up the critical value in the table, we obtain χ²₀.₀₁,₄ = 13.27 4. Apply the Decision Rule As the test statistic (18.89) is greater than the critical value (13.27), we reject the null hypothesis 5. Make a Conclusion By rejecting the null hypothesis, we conclude, at the 1% level of significance, that children’s weight is related to screen time (a) {b) v s ù s BR ° v do 15 20 (e) (d) Figure C.6: (a) Chi-square distribution with 3 degrees of freedom, arca above 6.25 shaded. (b) 2 degrees of freedom, arca above 4.3 shaded. (c) 5 degrees of freedom, area above 5.1 shaded. (d) 7 degrees of freedom, area above 11.7 shaded. Upper tail [UE] 02 010.05 | 0.02 001 00065 0.001 df 1| 1070 164 271 381] 541 663 TA 10.83 2 | 241 322 del 5.99] 782 021 1060 1382 3| 3660 a60 625 7a] ost insi 1284 1627 4| 4880 599 778 949 | 1167 1328 1486 1847 5 | 606 7.29 924 11.07] 1339 1509 1675 20.52 G| 723 856 1064 12.59 | 15.08 1681 1855 2246 T| 838° 080 1202 1407 | 1662 1848 20.28 2432 8 | 952 1108 1336 1551 | 1817 2000 2195 2642 9 | 1066 1224 1468 16.92 | 1068 2167 2359 2788 10 | 1178 1344 15.99 18.31 | 116 2321 2519 2959 Il | 1290 1463 17.28 10.68 | 22.62 2472 2676 3126 12 | 1401 1581 1855 21.03 | 24058 262 2830 3291 13 | 15.12 1698 19.81 22.36 | 25.47 2760 2082 3453 14 | 1622 18415 2106 23.68 | 2687 2014 3132 3612 15 | 17.32 1931 2231 25.00 | 28.26 2058 a280 3770 IG | 1842 2047 23.64 26.30 | 20.68 82.00 3427 39.25 17 | 1951 2161 2477 27.59 | ai00 g341 3872 d0.79 18 | 20.60 2276 25.99 28.87 | 3235 s481o 3716 4231 19 | 21.69 23.90 27.20 30.14 | 3369 2619 ans 4382 20 | 22.77 25.04 2841 3141 | 35.02 a747 4000 4531 37.65 | AL5T d4d3i o 4693 52.62 33.53 26.25 4026 43.77 | ario6 5089 5a67 5070 44.16 47.27 SISI GG.76 | 60.44 6369 6677 7340 50 | 5472 58.16 6317 67.50 | 72.61 7615 7949 8666 » È E È E s
Docsity logo


Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved