Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Stats Cheat Sheet 2004, Cheat Sheet of Statistics

Fundamentals, Bivariate - Scatterplots & Correlation, Single Variable Data - Distributions and other topics

Typology: Cheat Sheet

2020/2021

Uploaded on 04/26/2021

alannis
alannis 🇺🇸

4.7

(12)

18 documents

Partial preview of the text

Download Stats Cheat Sheet 2004 and more Cheat Sheet Statistics in PDF only on Docsity! Statistics Cheat Sheet 1. Fundamentals a. Population – Everybody to be analysed  Parameter - # summarizing Pop b. Sample – Subset of Pop we collect data on  Statistics - # summarizing Sample c. Quantitative Variables – a number  Discrete – countable (# cars in family)  Continuous – Measurements – always # between d. Qualitative  Nominal – just a name  Ordinal – Order matters (low, mid, high) Choosing a Sample  Sample Frame – list of pop we choose sample from  Biased – sampling differs from pop characteristics.  Volunteer Sample – any of below three types may end up as volunteer if people choose to respond. Sample Designs e. Judgement Samp: Choose what we think represents  Convenience Sample – easily accessed people f. Probability Samp: Elements selected by Prob  Simple random sample – every element = chance  Systematic sample – almost random but we choose by method g. Census – data on every everyone/thing in pop Stratified Sampling Divide pop into subpop based upon characteristics h. Proportional: in proportion to total pop i. Stratified Random: select random within substrata j. Cluster: Selection within representative clusters Collect the Data k. Experiment: Control the environment l. Observation: 2. Single Variable Data - Distributions m. Graphing Categorical: Pie & bar chart) n. Histogram (classes, count within each class) o. – shape, center, spread. Symmetric, skewed right, skewed left p. Stemplots 0 11222 0 112233 1 011333 0 56677 2 etc 1 q. Mean: nxx i / r. Median: M: If odd – center, if even - mean of 2 s. Boxplot: Min Q1 M Q3 Max t. Variance: )1/()1/()( 22  nSSnxxs x , u. p78: standard deviation, s = √s2 v.    nxxxxSS x /)()( 222 w. Density curve – relative proportion within classes – area under curve = 1 x. Normal Distribution: 68, 95, 99.7 % within 1, 2, 3 std deviations. y. p98: z-score sxxz /)(  or  /)( x z. Standard Normal: N(0,1) when N(μ,σ)) 3. Bivariate - Scatterplots & Correlation a. Explanatory – independent variable b. Response – dependent variable c. Scatterplot: form, direction, strength, outliers d. – form is linear negative, … e. – to add categorical use different color/symbol f. p147: Linear Correlation- direction & strength of linear relationship g. Pearsons Coeff: {-1 ≤ r ≤ 1} 1 is perfectly linear + slope, -1 is perfectly linear – slope. h. yx xy yx SSSS SS s yy s xx n r      )()( * 1 1 , i. r = zxzy / (n - 1), j.     n yx xySS xy 4. Regression k. least squares – sum of squares of vertical error minimized l. p154: y = b0 + b1x, or bxay   , m. (same as y = mx + b) n. x xy SS SS xx yyxx b       21 )( ))(( = r (sy / sx) o. Then solving knowing lines thru centroid ( xbyayx );,( p. n xby b     )( 1 0 q. r^2 is proportion of variation described by linear relationship r. residual = y - y  = observed – predicted. Statistics Cheat Sheet s. Outliers: in y direction -> large residuals, in x direction -> often influential to least squares line. t. Extrapolation – predict beyond domain studied u. Lurking variable v. Association doesn't imply causation 5. Data – Sampling a. Population: entire group b. Sample: part of population we examine c. Observation: measures but does not influence response d. Experiment: treatments controlled & responses observed e. Confounded variables (explanatory or lurking) when effects on response variable cannot be distinguished f. Sampling types: Voluntary response – biased to opinionated, Convenience – easiest g. Bias: systematically favors outcomes h. Simple Random Sample (SRS): every set of n individuals has equal chance of being chosen i. Probability sample: chosen by known probability j. Stratified random: SRS within strata divisions k. Response bias – lying/behavioral influence 6. Experiments a. Subjects: individuals in experiment b. Factors: explanatory variables in experiment c. Treatment: combination of specific values for each factor d. Placebo: treatment to nullify confounding factors e. Double-blind: treatments unknown to subjects & individual investigators f. Control Group: control effects of lurking variables g. Completely Randomized design: subjects allocated randomly among treatments h. Randomized comparative experiments: similar groups – nontreatment influences operate equally i. Experimental design: control effects of lurking variables, randomize assignments, use enough subjects to reduce chance j. Statistical signifi: observations rare by chance k. Block design: randomization within a block of individuals with similarity (men vs women) 7. Probability & odds a. 2 definitions: b. 1) Experimental: Observed likelihood of a given outcome within an experiment c. 2) Theoretical: Relative frequency/proportion of a given event given all possible outcomes (Sample Space) d. Event: outcome of random phenomenon e. n(S) – number of points in sample space f. n(A) – number of points that belong to A g. p 183: Empirical: P'(A) = n(A)/n = #observed/ #attempted. h. p 185: Law of large numbers – Exp -> Theoret. i. p. 194: Theoretical P(A) = n(A)/n(S) , favorable/possible j. 0 ≤ P(A) ≤ 1, ∑ (all outcomes) P(A) = 1 k. p. 189: S = Sample space, n(S) - # sample points. Represented as listing {(, ), …}, tree diagram, or grid l. p. 197 Complementary Events P(A) + P( A ) = 1 m. p200: Mutually exclusive events: both can't happen at the same time n. p203. Addition Rule: P(A or B) = P(A) + P(B) – P(A and B) [which = 0 if exclusive] o. p207: Independent Events: Occurrence (or not) of A does not impact P(B) & visa versa. p. Conditional Probability: P(A|B) – Probability of A given that B has occurred. P(B|A) – Probability of B given that A has occurred. q. Independent Events iff P(A|B) = P(A) and P(B|A) = P(B) r. Special Multiplication. Rule: P(A and B) = P(A)*P(B) s. General mult. Rule: P(A and B) = P(A)*P(B|A) = P(B)*P(A|B) t. Odds / Permutations u. Order important vs not (Prob of picking four numbers) v. Permutations: nPr, n!/(n – r)! , number of ways to pick r item(s) from n items if order is important : Note: with repetitions p alike and q alike = n!/p!q!. w. Combinations: nCr, n!/((n – r)!r!) , number of ways to pick r item(s) from n items if order is NOT important x. Replacement vs not (AAKKKQQJJJJ10) (a) Pick an A, replace, then pick a K. (b) Pick a K, keep it, pick another. y. Fair odds - If odds are 1/1000 and 1000 payout. May take 3000 plays to win, may win after 200. 8. Probability Distribution a. Refresh on Numb heads from tossing 3 coins. Do grid {HHH,….TTT} then #Heads vs frequency chart{(0,1), (1,3), (2,3), (4,1)} – Note Pascals triangle b. Random variable – circle #Heads on graph above. "Assumes unique numerical value for each outcome in sample space of probability experiment". c. Discrete – countable number d. Continuous – Infinite possible values. e. Probability Distribution: Add next to coins frequency chart a P(x) with 1/8, 3/8, 3/8, 1/8 values - 2 -
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved