Download Stats Cheat Sheet 2004 and more Cheat Sheet Statistics in PDF only on Docsity! Statistics Cheat Sheet 1. Fundamentals a. Population – Everybody to be analysed Parameter - # summarizing Pop b. Sample – Subset of Pop we collect data on Statistics - # summarizing Sample c. Quantitative Variables – a number Discrete – countable (# cars in family) Continuous – Measurements – always # between d. Qualitative Nominal – just a name Ordinal – Order matters (low, mid, high) Choosing a Sample Sample Frame – list of pop we choose sample from Biased – sampling differs from pop characteristics. Volunteer Sample – any of below three types may end up as volunteer if people choose to respond. Sample Designs e. Judgement Samp: Choose what we think represents Convenience Sample – easily accessed people f. Probability Samp: Elements selected by Prob Simple random sample – every element = chance Systematic sample – almost random but we choose by method g. Census – data on every everyone/thing in pop Stratified Sampling Divide pop into subpop based upon characteristics h. Proportional: in proportion to total pop i. Stratified Random: select random within substrata j. Cluster: Selection within representative clusters Collect the Data k. Experiment: Control the environment l. Observation: 2. Single Variable Data - Distributions m. Graphing Categorical: Pie & bar chart) n. Histogram (classes, count within each class) o. – shape, center, spread. Symmetric, skewed right, skewed left p. Stemplots 0 11222 0 112233 1 011333 0 56677 2 etc 1 q. Mean: nxx i / r. Median: M: If odd – center, if even - mean of 2 s. Boxplot: Min Q1 M Q3 Max t. Variance: )1/()1/()( 22 nSSnxxs x , u. p78: standard deviation, s = √s2 v. nxxxxSS x /)()( 222 w. Density curve – relative proportion within classes – area under curve = 1 x. Normal Distribution: 68, 95, 99.7 % within 1, 2, 3 std deviations. y. p98: z-score sxxz /)( or /)( x z. Standard Normal: N(0,1) when N(μ,σ)) 3. Bivariate - Scatterplots & Correlation a. Explanatory – independent variable b. Response – dependent variable c. Scatterplot: form, direction, strength, outliers d. – form is linear negative, … e. – to add categorical use different color/symbol f. p147: Linear Correlation- direction & strength of linear relationship g. Pearsons Coeff: {-1 ≤ r ≤ 1} 1 is perfectly linear + slope, -1 is perfectly linear – slope. h. yx xy yx SSSS SS s yy s xx n r )()( * 1 1 , i. r = zxzy / (n - 1), j. n yx xySS xy 4. Regression k. least squares – sum of squares of vertical error minimized l. p154: y = b0 + b1x, or bxay , m. (same as y = mx + b) n. x xy SS SS xx yyxx b 21 )( ))(( = r (sy / sx) o. Then solving knowing lines thru centroid ( xbyayx );,( p. n xby b )( 1 0 q. r^2 is proportion of variation described by linear relationship r. residual = y - y = observed – predicted. Statistics Cheat Sheet s. Outliers: in y direction -> large residuals, in x direction -> often influential to least squares line. t. Extrapolation – predict beyond domain studied u. Lurking variable v. Association doesn't imply causation 5. Data – Sampling a. Population: entire group b. Sample: part of population we examine c. Observation: measures but does not influence response d. Experiment: treatments controlled & responses observed e. Confounded variables (explanatory or lurking) when effects on response variable cannot be distinguished f. Sampling types: Voluntary response – biased to opinionated, Convenience – easiest g. Bias: systematically favors outcomes h. Simple Random Sample (SRS): every set of n individuals has equal chance of being chosen i. Probability sample: chosen by known probability j. Stratified random: SRS within strata divisions k. Response bias – lying/behavioral influence 6. Experiments a. Subjects: individuals in experiment b. Factors: explanatory variables in experiment c. Treatment: combination of specific values for each factor d. Placebo: treatment to nullify confounding factors e. Double-blind: treatments unknown to subjects & individual investigators f. Control Group: control effects of lurking variables g. Completely Randomized design: subjects allocated randomly among treatments h. Randomized comparative experiments: similar groups – nontreatment influences operate equally i. Experimental design: control effects of lurking variables, randomize assignments, use enough subjects to reduce chance j. Statistical signifi: observations rare by chance k. Block design: randomization within a block of individuals with similarity (men vs women) 7. Probability & odds a. 2 definitions: b. 1) Experimental: Observed likelihood of a given outcome within an experiment c. 2) Theoretical: Relative frequency/proportion of a given event given all possible outcomes (Sample Space) d. Event: outcome of random phenomenon e. n(S) – number of points in sample space f. n(A) – number of points that belong to A g. p 183: Empirical: P'(A) = n(A)/n = #observed/ #attempted. h. p 185: Law of large numbers – Exp -> Theoret. i. p. 194: Theoretical P(A) = n(A)/n(S) , favorable/possible j. 0 ≤ P(A) ≤ 1, ∑ (all outcomes) P(A) = 1 k. p. 189: S = Sample space, n(S) - # sample points. Represented as listing {(, ), …}, tree diagram, or grid l. p. 197 Complementary Events P(A) + P( A ) = 1 m. p200: Mutually exclusive events: both can't happen at the same time n. p203. Addition Rule: P(A or B) = P(A) + P(B) – P(A and B) [which = 0 if exclusive] o. p207: Independent Events: Occurrence (or not) of A does not impact P(B) & visa versa. p. Conditional Probability: P(A|B) – Probability of A given that B has occurred. P(B|A) – Probability of B given that A has occurred. q. Independent Events iff P(A|B) = P(A) and P(B|A) = P(B) r. Special Multiplication. Rule: P(A and B) = P(A)*P(B) s. General mult. Rule: P(A and B) = P(A)*P(B|A) = P(B)*P(A|B) t. Odds / Permutations u. Order important vs not (Prob of picking four numbers) v. Permutations: nPr, n!/(n – r)! , number of ways to pick r item(s) from n items if order is important : Note: with repetitions p alike and q alike = n!/p!q!. w. Combinations: nCr, n!/((n – r)!r!) , number of ways to pick r item(s) from n items if order is NOT important x. Replacement vs not (AAKKKQQJJJJ10) (a) Pick an A, replace, then pick a K. (b) Pick a K, keep it, pick another. y. Fair odds - If odds are 1/1000 and 1000 payout. May take 3000 plays to win, may win after 200. 8. Probability Distribution a. Refresh on Numb heads from tossing 3 coins. Do grid {HHH,….TTT} then #Heads vs frequency chart{(0,1), (1,3), (2,3), (4,1)} – Note Pascals triangle b. Random variable – circle #Heads on graph above. "Assumes unique numerical value for each outcome in sample space of probability experiment". c. Discrete – countable number d. Continuous – Infinite possible values. e. Probability Distribution: Add next to coins frequency chart a P(x) with 1/8, 3/8, 3/8, 1/8 values - 2 -