Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Calculating Probabilities & Confidence Intervals for Normal Distributions & Sampling, Exams of Statistics

Examples and formulas for calculating probabilities and confidence intervals for normal random variables and sampling distributions. Topics covered include normal distributions, z-scores, standard deviations, sampling distributions, confidence intervals, and large sample confidence intervals for population means and proportions.

Typology: Exams

Pre 2010

Uploaded on 09/17/2009

koofers-user-9kh
koofers-user-9kh 🇺🇸

10 documents

1 / 45

Toggle sidebar

Related documents


Partial preview of the text

Download Calculating Probabilities & Confidence Intervals for Normal Distributions & Sampling and more Exams Statistics in PDF only on Docsity! 59 Chapter 10: Continuous Probability Distributions 10.1: Basic Concepts Recall that a continuous random variable is one that can assume any value within some interval or intervals [ie. between the heights 6’ and 6’1’’, there is an infinite (uncountable) number of other heights]. The graphical form of the probability distribution for a continuous random variable x is a smooth curve that might appear as shown. a b This curve, a function of x, is denoted by the symbol f(x) and is called a probability density function. The area under the curve is equivalent to probabilities in continuous distributions. Therefore, the area under the curve of any probability distribution must equal 1. 60 10.2: The Normal Distribution One of the most common types of continuous random variables is a Normal Random Variable. It has a bell-shaped probability distribution called a Normal Distribution. The Normal Distribution is perfectly symmetric about its mean µ and it’s spread is determined by its standard deviation σ. Some examples of Normal Curves with different values of µ and σ are: Mean, µ=0 SD, σ=1 0 Mean, µ=5 SD, σ=1 5 Mean, µ=0 SD, σ=2 0 Mean, µ=5 SD, σ=2 5 63 0 a = .5 - 0 a 3) Again, since the standard normal curve is symmetric about 0, P(X<a) = P(-a<X<a) = 2P(0<X<a) -a 0 a = 2 x 0 a These rules are helpful since often we need to manipulate what we are looking for since it is not always immediately obvious from the table. Also, after a while, try to get out of the habit of thinking in terms of "what the rules are" and just get used to thinking about "what transformations make sense". Also realize that, once you have standardized your original value into a value of Z, your new variable will be in units of "standard deviations from the mean". For example, a value of Z=1.25 is 1.25 standard deviations away from the mean. Now consider how much of our data is between Z=2 and Z=-2. Due to symmetry, this will be twice as much as the amount that is between 0 and 2, as shown below. 64 -2 0 2 = 2 x 0 2 So P(-2 < Z < 2) = 2(.4772) = .9544. So we expect 95% of our data to fall within two standard deviations of the mean. This matches what the Empirical Rule told us earlier in the course Example The life of a car headlight has a normal distribution with a mean of 1000 hours and standard deviation of 80 hours. Q. What is the probability that if you purchase a new headlight that it will last more than 1200 hours? P(X > 1200) = P      −> − 80 10001200 σ µX = P(Z > 2.5) 65 P( Z > 2.5) = 1 - P(Z < 2.5) = 0.5 - P(0 < Z < 2.5) = .5 - .4938 = .0062 Q. What is the probability that if you purchase a new headlight that it will last between 900 and 1400 hours? P(900<X<1400) = P       −< − < − 80 10001400 80 1000900 σ µX = P( -1.25 < Z < 5) P( -1.25 < Z < 5) = P( -1.25 < Z < 0) + P( 0 < Z < 5) = P( 0 < Z < 1.25) + P( 0 < Z < 5) = .3944 + .5 = .8944 Q. What is the probability that if you purchase a new headlight that it will last exactly 1100 hours? P(X = 1100) = 0 since continuous probabilities can never equal exact values. 68 given half-hour period than in the time they have to wait before being cleared for the approach. Whenever the number of occurrences of an event is determined by a Poisson process, the likelihood of encountering specified intervals of time or space between consecutive occurrences can be described by the exponential probability distribution. The distribution applies: • only to positive values of the random variable, x • only in situations in which smaller values of x are more likely than larger ones As was the case with the normal curve, the total area under the exponential probability density function equals 1, and various probabilities can be found by focusing on areas under this curve for different ranges of the exponential random variable. We can calculate probabilities for exponential random variables using either of the following formulas: xexXP λ−=> )( or xexXP λ−−=< 1)( where x >0 and λ > 0 and e ≈ 2.71828, Example The arrival of claims at an insurance company can be described as a Poisson process occurring at a rate of 2 claims per day. Q. What is the probability that the next claim will be made within 4 days? 69 A. Since the number of claims follows a Poisson distribution, the times between claims will follow an exponential distribution. We also have λ = 2. xeXP λ−−=< 1)4( 9997. 1 1 8 )4)(2( = −= −= − − e e Q. What is the probability that the next claim will be made after the passage of 2 days? A. xeXP λ−=> )2( 0183. 4 )2)(2( = = = − − e e Q. What is the probability that the next claim will be made at some time between 3 and 5 days hence? A. )5()3()53( >−>=<< XPXPXP 0024. 106 )5)(2()3)(2( = −= −= −− −− ee ee 70 10.10: The Uniform Distribution Continuous random variables that have equally likely outcomes over the range of possible values possess a uniform probability distribution. Example Say a variable X is uniformly distributed between values 15 and 25. If we want to find the probability that X is greater than 22, we could first look at the probability density function. We can see that the area we are interested in is just the area of our shaded interval divided by the area of our larger rectangle. We could also calculate this as: P(X≥22) = P(22 ≤ X ≤ 25) = 25 – 22 = 3 = .3 P(15 ≤ X ≤ 25) 25 – 15 10 Similarly, the probability that X is between 18 and 22 is just: P(18≤ X≤ 22) = P(18 ≤ X ≤ 22) = 22 – 18 = 4 = .4 P(15 ≤ X ≤ 25) 25 – 15 10 73 The population variance σ2 is equal to: Var(X) = [Σx2p(x)] - µ2 = [02(1/4) + 32(1/2) + 122(1/4)] – (4.5)2 = [0 + 9/2 + 144/4] – 20.25 = 40.5 – 20.25 = 20.25 Therefore the population standard deviation, σ, is equal to: SD(X) = 5.4)25.20( = A random sample of n=2 measurements is selected from the population. A list of all possible samples is: Possible Samples x Probability (0,0) 0 (1/4)(1/4) = 1/16 (0,3) 1.5 (1/4)(1/2) = 2/16 (0,12) 6 (1/4)(1/4) = 1/16 (3,0) 1.5 (1/2)(1/4) = 2/16 (3,3) 3 (1/2)(1/2) = 4/16 (3,12) 7.5 (1/2)(1/4) = 2/16 (12,0) 6 (1/4)(1/4) = 1/16 (12,3) 7.5 (1/4)(1/2) = 2/16 (12,12) 12 (1/4)(1/4) = 1/16 The sampling distribution for x is given by x 0 1.5 3 6 7.5 12 p( x ) 1/16 4/16 4/16 2/16 4/16 1/16 Thus the expectation of the sample mean would be: 74 E( x ) = Σ x p( x ) = 0(1/16) + 1.5(4/16) + 3(4/16) + 6(2/16) + 7.5(4/16) + 12 (1/16) = 4.5 And so the expectation of the sample mean (if we take all possible samples) is equal to the population mean. Also, the variance of the sample mean will be: Var( x ) = [Σ x 2p( x )] - µ2 = [02(1/16)+(1.5)2(4/16)+...+122(1/16)] - (4.5)2 = [0 + 9/16 + 36/16 + 72/16 + 225/16 + 144/16] - 25 = 30.375 – 20.25 = 10.125 So we can see that the variance of the sample mean is different than the variance of the individual values. SD( x ) = 182.3)125.10( = Properties on the Sampling Distribution of X 1. The mean of the sampling distribution equals the mean of the sampled population, ie. µµ =x 2. The standard deviation of the sampling distribution equals the standard deviation of the sampled population divided by the square root of the sample size, ie. nx σσ = . Note: Check on the previous worked example. The Central Limit Theorem gives us the following two rules: 75 Rule 1: (when X is Normal) If x is a normal random variable with mean µ and standard deviation σ and x is the sample mean for a sample of size n taken from said population then the sampling distribution of x will be normal with mean xµ = µ and standard deviation xσ = n σ Rule 2: (when X is not necessarily Normal) If x is any random variable with mean µ and standard deviation σ and x is the sample mean for a sample of size n taken from said population then the sampling distribution of x will be normal with mean xµ = µ and standard deviation xσ = n σ if n is large (30 or greater). Note that the standard deviation of x , xσ = n σ is often referred to as the standard error of the mean. The Central Limit Theorem therefore tells us that if we take a sample of size n from a distribution with mean µ and standard deviation σ, then the distribution of the mean of that sample, x , will be normal if at least one of the following is true: • If the original distribution from which we are sampling is normally distributed. • If the sample size we are calculating the sample mean from is larger than 30. If either of the above is true then x will have a Normal distribution with mean µ and standard deviation n σ 78 $1,000,000 quite easily if they had it to gamble) whereas the mean is much more stable and covers a much smaller area. At the end of each day, the casino almost always wins. Chapter 12: Estimation The process of estimation begins by sampling an already existing, present population. It must not be confused with forecasting, which is discussed in Chapter 19, which seeks to make statements about parameters of future populations. 12.2: Defining A Good Estimator Three major criteria are commonly employed when considering what makes a good estimator: unbiasedness, efficiency and consistency. Unbiasedness: A sample statistic is an unbiased estimator if the mean of all possible values of that statistic equals the population parameter the statistic seeks to estimate. Efficiency: Among all the available unbiased estimators, the sample statistic that has the smallest variance for a given sample size is the efficient estimator. Consistency: A sample statistic is a consistent estimator if its value gets ever closer to the parameter being estimated as the sample size increases. If we wished to estimate a true unknown mean, µ, by taking a sample of size 20, say, both the sample mean and sample median would be unbiased estimators. The difference between the two estimators though is in their variances. Since the sample mean is a more stable estimator, with smaller variance, we say it is the more efficient estimator of the two. The sample mean is also consistent. 79 12.3: Types Of Estimators A point estimator of a population parameter is a rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the population parameter. An interval estimator (or confidence interval) is a formula that tells us how to use sample data to calculate an interval that estimates a population parameter. For example, a 95% confidence interval for the population mean tells us that, in the long run, 95% of our sample confidence intervals will contain µ and 5% will not. Whether our interval does contain µ is based on how well our sample represents the population as a whole. For a better idea of what it means to be 95% confident, consider the following diagram. Firstly, notice that all the intervals that we are using to estimate µ are centered at X . It makes sense to base your interval estimate around your single best estimate from each sample, which is your point estimate. Also realize that we have many different intervals because we will get different data from one sample to the next. Each interval represents the results from a particular sample. 80 When we say we are "95% confident" we mean that we would expect 95% of the intervals that were similarly constructed to contain our unknown population mean, µ. This does not mean, however, that if we constructed one interval, it would contain µ with probability .95. In fact, once an interval has been constructed, the probability of it containing µ is either 0 or 1 (it does or it doesn't). This may seem like a minor technicality, but how we phrase any statements about "confidence" is always vitally important. 12.5: Large Sample Confidence Interval For A Population Mean A Large Sample Confidence Interval for µ is given by: X ± Z xσ = X ± Z      n σ 83 If a random sample of 10 documents is selected, and the sample mean number of words per document is computed to be 12,335, find a 90% confidence interval for the mean number of words per document. A. Since X is normal (assumed) with σ = 1500 then X will be normal with xσ = n σ = 10 1500 = 474.3416. To find a 90% confidence interval we would be looking for a value of Z like that pictured below. So by looking up the probability .4500 in our Normal Tables we find an upper bound of Z = 1.645. Therefore our 90% confidence interval will be: X ± Z      n σ = 12335 ± (1.645)(474.3416) = 12335 ± 780.2920 = (11554.71, 13115.29) So we are 90% confident that the interval (11554.71, 13115.29) contains our population mean µ. 84 Note: The bound that we add or subtract to our value of X (given as Z      n σ in the formula) is often called the margin of error, since it is the maximum difference allowed between our sample and population means before our interval fails to contain µ. Sometimes we may be asked to find other pieces of information from a particular interval. Example Q. A 90% confidence interval for the mean percentage of airline reservations being canceled on the day of the flight is (4.8, 11.2). What is the point estimate of the mean percentage of reservations that are cancelled on the day of the flight? A. The point estimate of the population mean is just the sample mean X . We know that X is in the center of the interval and so we just take the average of the lower and upper bounds. This gives us: X = 2 2.118.4 + = 2 16 = 8 If we wanted to find the margin of error we would just realize that it’s half of the length of the interval. Margin Of Error = 2 8.42.11 − = 2 4.6 = 3.2 If we have a 90% confidence interval, this would look like: 85 So our value of Z for the interval would be 1.645, since P(0<Z<1.645) = .45 If we wanted to find x σ , we would just realize that our margin of error (which from now on I will denote as B for bound) is equal to Z      n σ . This gives us: B = Z xσ which means xσ = B/Z = 3.2 / 1.645 = 1.9453 If we were told that the sample size taken was 64, then we could find the population standard deviation by using the fact that: xσ = n σ which means that σ = n ( xσ ) = 64 (1.9453) = 15.5623 And if we were asked to find µ? You would not be able to answer since µ is unknown. All we can do is estimate it or come up with a confidence interval. 88 (or approximately normal), so our first problem is easily solved. To get over our second problem we need to use a t statistic. The distribution of the t-statistic is very similar to the normal distribution, the only difference being that we use it only when n<30. Our formula to calculate the t statistic is: t = X S X µ− = n s X µ− which differs from the normal distribution only in that it allows us to approximate σ with s, even though we realize the approximation may not be that great. Similarly, if we wish to calculate a confidence interval for µ when our value of X follows a t distribution, our formula is just: X ± xts = X ± t      n s Before we can look up a value in a t-table, we need to know how many degrees of freedom our t statistic has. For any sample size n, our degrees of freedom for t will be (n-1). Example A new cigarette has recently been marketed. The FDA tests on this cigarette gave a mean nicotine content of 26.4 milligrams and standard deviation of 2.0 milligrams for a sample of n=9 cigarettes. Assume that the distribution of the amount of nicotine found in this brand is normal. Q. Construct a 90% confidence interval for the mean nicotine content of this brand of cigarette. 89 A. Since n=9 (<30) and we don’t know σ (only s), we need to use the t distribution with n-1 = 8 degrees of freedom. A 90% confidence interval will be of the form. The P(t > 1.86) = .05 and so our value of t is 1.86. [We wanted a 90% confidence interval and so we need 5% of our data in each tail. We just look in the column for t.05 and in the row for 8 degrees of freedom to get t=1.86] Therefore our 90% confidence interval is given by: X ± t       n s = 26.4 ± 1.86       9 )0.2( = 26.4 ± 1.24 = (25.16, 27.64) So we are 90% confident that the interval (25.16, 27.64) contains our population mean µ. Similarly, if we wanted a 99% confidence interval for µ we need to find the t statistic with 8 degrees of freedom and .005 in each tail. 90 This gives us t = 3.355 and so our confidence interval is given by: X ± t      n s = 26.4 ± 3.355      9 )0.2( = 26.4 ± 2.24 = (24.16, 28.64) So we are 99% confident that the interval (24.16, 28.64) contains our population mean µ. Note: A small sample does not always mean we need to use the t distribution. If you know σ you should use the normal distribution. 12.6: Large Sample Confidence Interval For A Population Proportion The formula to determine the sample proportion is: p̂ = n x where x is the “number of successes” and n is the total number of observations. For example, say we took a sample of 1,000 people and found that 720 people approved of the job the president was doing. To estimate the population proportion who approved we would just have: 93 financial aid to within 3% with 99% reliability, how many students would need to be sampled? A. Firstly p̂ = 118/200 = .59 q̂ = 1 - .59 = .41 B = .03 (remember, interval for proportion) So Z = 2.575 n = 2 2 B pqZ ≈ 2 2 ˆˆ B qpZ = 2 2 )03(. )41)(.59(.)575.2( = 1782.16 So the smallest possible sample size that would be good enough is 1783. What if we have no estimates of p and q in advance? If this happens we must make sure we have a sample size that will be large enough, regardless of what p and q are. We do this by maximizing pq in our sample size determining formula. Consider the following values of p and q: 94 p q pq .1 .9 .09 .2 .8 .16 .4 .6 .24 .49 .51 .2499 .5 .5 .25 .51 .49 .2499 M M M So pq is maximized when p=.5 and q=.5 Note though that if you are given an estimate of what p and q are, you don't need to use the above process. Comparing Two Population Means – Independent Sampling Many of the same procedures that are used to estimate and test hypotheses about a single parameter can be modified to make inferences about two parameters. Both the z and t statistics can be adapted to make inferences about the difference between two population means. When comparing two population means we use the following formula: µ1 - µ2 In order to estimate this value we would use the sample statistic: 21 xx − Properties of the Sampling Distribution of 21 xx − 1. The mean of the sampling distribution ( 21 xx − ) is (µ1 - µ2) 2. If the two samples are independent, the standard deviation of the sampling distribution is 95 2 2 2 1 2 1 )( 21 nnxx σσ σ += − where σ12 and σ22 are the variances of the two populations being sampled and n1 and n2 are the respective sample sizes. We also refer to )( 21 xx −σ as the standard error of the statistic ( 21 xx − ). 3. The sampling distribution of ( 21 xx − ) is approximately normal for large samples by the Central Limit Theorem. The sample sizes are large when both n1 and n2 are greater than 30. A large sample confidence interval for (µ1 - µ2) is given by: ( 21 xx − ) ± Z )( 21 xx −σ = ( 21 xx − ) ± Z 2 2 2 1 2 1 nn σσ + If we do not have large sample sizes (n1<30 or n2<30) we need to use the t-distribution. A small sample confidence interval for (µ1 - µ2) is given by: ( 21 xx − ) ± t      + 21 2 11 nn sp where sp2 = (n1 – 1)s12 + (n2 – 1)s22 n1 + n2 – 2 and t is based on (n1 + n2 – 2) degrees of freedom. Note: sp is called the pooled standard deviation since it combines the standard deviations of both samples. When dealing with small samples we must make the following assumptions: • Both sampled populations have relative frequency distributions that are approximately normal. 98 A. 2 )1()1( 22 −+ −+− = ML MMLL p nn snsns 21515 )6.2)(14()2.2)(14( −+ + = 4.2= = 1.549 Q. Construct a 99% confidence interval for the true mean difference in the time taken to relieve headaches (µM - µL). A. ( LM xx − ) ± t      + LM p nn s 112 = ( 4.89.8 − ) ± 2.763       + 15 1 15 14.2 = 5.0 ± 2.763       15 24.2 = 5.0 ± 1.563 = (-1.063, 2.063) Therefore we cannot conclude that a difference exists between the two aspirins in terms of the time taken to relieve headaches. Comparing Two Population Means – Large Matched Pairs Sample Now let us turn to an alternative approach to making interval estimates for the difference between two means. Consider taking a sample after each elementary unit in population A has been matched with a “twin” from population B, so that any sample observation about a unit in population A automatically yields an 99 associated observation about a unit in population B. This procedure is referred to as taking a matched-pairs sample. For example, when testing the effectiveness of a new drug compared to a traditional one, each patient in an experimental group might be matched with a partner in a control group of the same age, weight, height, sex, occupation, medical history, lifestyle, and so on. The individual differences in response to the experimental stimulus between each pair are then used to estimate population differences. The procedure can be summarized as follows: Let the sample observation for the ith pair equal XAi and XBi, depending on whether it refers to the partner from population A or B. Then the matched-pair difference equals Di = XAi - XBi. From all the matched-pair differences involving n pairs, a mean and standard deviation can be calculated in the usual fashion as: n D D i∑= and 1 22 − − = ∑ n DnD s iD Now we can treat D and sD equivalent to X and s in our one sample confidence intervals for large sample sizes. The same is also true in small sample cases, however we must be sure that the two samples are from normally distributed populations with equal but unknown variances. Example You wish to compare the mean daily sales of two restaurants located in the same city. You record the restaurants’ total sales for each of 12 randomly selected days during a six-month period. The results are as shown: 100 Day Restaurant 1 Restaurant 2 1 (Wed) $1005 $918 2 (Sat) 2073 1971 3 (Tue) 873 825 4 (Wed) 1074 999 5 (Fri) 1932 1827 6 (Thur) 1338 1281 7 (Thur) 1449 1302 8 (Mon) 759 678 9 (Fri) 1905 1782 10 (Mon) 693 639 11 (Sat) 2106 2049 12 (Tue) 981 933 Q. Construct a 95% confidence interval for the difference between the mean daily sales of the two restaurants. A. We must first work out the differences (as well as the squared differences for our variance formula). Pair Restaurant 1 XAi Restaurant 2 XBi Matched-Pair Difference Di = XAi - XBi Squared Difference Di2 1 $1005 $918 87 7569 2 2073 1971 102 10404 3 873 825 48 2304 4 1074 999 75 5625 5 1932 1827 105 11025 6 1338 1281 57 3249 7 1449 1302 147 21609 8 759 678 81 6561 9 1905 1782 123 15129 10 693 639 54 2916 11 2106 2049 57 3249 12 981 933 48 2304 Totals 984 91944
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved