Download Goodness of Fit, Composite Hypothesis | STAT 371 and more Study notes Statistics in PDF only on Docsity! Goodness of fit We observe data like that in the following table: RR RW WW observed 35 43 22 expected 25 50 25 We want to know: Do these data correspond reasonably to the proportions 1:2:1? 1 Goodness of fit RR RW WW observed 35 43 22 expected 25 50 25 X2 = ∑ (observed− expected)2 expected = (35− 25)2 25 + (43− 50)2 50 + (22− 25)2 25 = 5.34 1-pchisq(5.34, 2) ≈ 6.9% Or: chisq.test( c(35,43,22), p=c(0.25, 0.5, 0.25) ) 2 Composite hypotheses Sometimes, we ask not pAA = 0.25, pAB = 0.5, pBB = 0.25 But rather something like: pAA = f 2, pAB = 2f(1− f), pBB = (1− f)2 for some f For example: Genotypes, of a random sample of individuals, at a diallelic locus. Question: Is the locus in Hardy-Weinberg equilibrium (as expected in the case of random mating)? Example data: AA AB BB 5 20 75 3 Another example ABO blood groups; 3 alleles A, B, O. Phenotype A = genotype AA or AO B = genotype BB or BO AB = genotype AB O = genotype O Allele frequencies: fA, fB, fO (Note that fA + fB + fO = 1) Under Hardy-Weinberg equilibrium, we expect: pA = f 2 A + 2fAfO pB = f 2 B + 2fBfO pAB = 2fAfB pO = f 2 O Example data: O A B AB 104 91 36 19 4 Results, example 2 Example data: O A B AB 104 91 36 19 H0 : pA = f 2 A + 2fAfO, pB = f 2 B + 2fBfO, pAB = 2fAfB, pO = f 2 O, for some fA, fB, fO MLE: f̂O ≈ 63.4%, f̂A ≈ 25.0%, f̂B ≈ 11.6%. Expected counts: 100.5 94.9 40.1 14.5 Test statistics: X2 = 2.10 Asymptotic χ2(df = 1) approx’n: P ≈ 15% 10,000 computer simulations: P ≈ 15% 9 Est'd null dist'n of chi−square statistic X2 0 2 4 6 8 Observed (P = 15%) 95th %ile = 3.86 10 Example 3 A scientist applied a dose of DDT to groups of 10 spider mites and counted the number of mites (out of ten) that survived. A total of 50 groups of mites were considered. 0 1 2 3 4 5 6 7 8 9 10 count 6 10 15 7 8 1 3 0 0 0 0 Q: Does this look a binomial distribution? If X ∼ binomial(n = 10, p), Pr(X=k) = (10 k ) pk(1− p)10−k for some p. 11 χ2 test MLE, p̂ = (0 × 6 + 1 × 03 + 2 × 15 + . . . 10 × 0) / (50 × 10) = 0.232 0 1 2 3 4 5 6 7 8 9 10 observed 6 10 15 7 8 1 3 0 0 0 0 expected 3.6 10.8 14.7 11.8 6.2 2.3 0.6 0.1 ∼0.0 ∼0.0 ∼0.0 X2 = ∑ (obs−exp)2 exp = (6−3.6)2 3.6 + (10−10.8)2 10.8 + (15−14.7)2 14.7 + · · · + (0−0)2 0 = 15.4 Compare to χ2(df = 11 – 1 – 1 = 9) −→ p-value = 0.082. By computer simulation: p-value = 0.045 12 Null simulation results Full distribution (by simulation) χ2 statistic 0 500 1000 1500 2000 2500 Focus on the left part χ2 statistic 0 5 10 15 20 25 χ2(df=9) Observed 13 Combine the rare bins 0 1 2 3 4 ≥5 observed 6 10 15 7 8 4 expected 3.6 10.8 14.7 11.8 6.2 2.9 X2 = ∑ (obs−exp)2 exp = (6−3.6)2 3.6 + (10−10.8)2 10.8 + (15−14.7)2 14.7 + · · · + (4−2.9)2 2.9 = 4.55 Compare to χ2(df = 6 – 1 – 1 = 4) −→ p-value = 0.34. By computer simulation: p-value = 0.34 14