Download Review Sheet for Exam - Business Statistics | BMGT 230B and more Study notes Business Statistics in PDF only on Docsity! • Statistics: Way of reasoning to help understand the world • Data: The value of the variables • Relational Database: When 2 or more separate data tables are linked together so that information can be merged across them. • Qualitative/Categorical Variables: When a variable names categories and answers questions about how cases fall into those categories. (Sex, year in school, major) • Quantitative/Numerical Variables: When a variable has measured numerical values with units and it tells us about the quantity of what is measured (Age, height, miles traveled) • N Discrete variables: Natural gap between values (# of kids, # of credit cards) • N Continuous variables: Values can be arbitrarily close together. (Weight, Height, Age) • C Ordinal variables: Categories that have a natural ordering (Yr in school, grade, preference) • C Nominal variables: No natural ordering. (Major, eye color) • Sampling Frame: List of populations (Phonebook, membership list) • Sample Error: Variability from sample to sample (good thing!) • Respondents: Individuals who answer a survey • Subjects/Participants: People on whom we experiment • Variable: Aspect/characteristic that differs from subject to subject, individual to individual. (Age, Sex, Major…) • Selection bias: Systematic tendency to exclude one kind of individual from survey. Not representative of population • Non-response bias: Subjects don’t answer Q • Response bias: Subjects lie • Undercoverage: Certain groups underrepresented • Sample vs. Population o Sample: The part of the population we actually examine o Population: Entire group of individuals in which we are interested but can’t usually assess directly. (All voters in US, all packages at UPS center, etc) o Statistic: # describing characteristic of Sample o Parameter: # describing characteristic of population (unknown) • Non Stat. vs. Stat.Random Sampling • N Convenience Samp: Collected in most convenient manner for researcher • N Voluntary Samp: Individuals choose to be involved. • S Simple Random Samp: Equal chance of being selected. Draw names from hat. • S Stratified Random Samp: Divide population into subgroups (strata) according to common characteristic. Simple random sample from each subgroup. Not random • S Cluster Samp: Divide pipulation into several “clusters,” each representative of population. Simple Rand. Sample of clusters. Randomly chosen • S Systematic: Decide on sample size n, divide ordered frame of N individuals into groups of k (k=N/n). Randomly select one from 1st group, select every kth individual • *Sample size doesn’t matter, just that it is representative of population* • Close-ended Qs: Select from short list of defined choices • Open-ended Qs: Respond w/ any value, words, or statement • Demographic Qs: About personal characteristics • Marginal Distribution: On count OR on %. Look at distribution of TOTALS • Conditional Distribution: To see whether or not 2 variables are related • Frequency Table: Shows # of cases for each category • Contingency Table: Shows how individuals are distributed along each variable • Pie Charts: Use when one category • Bar Graphs: If height is close to = then independant • Box Plot: Allows you to compare different populations. (Good for comparing over months, seasons, etc) • Histograms: Focused on frequencies. Distribution of points • Shape • Symmetric: When right and left sides are mirror images of each other. Mean and median close to each other. • Skewed to R: When right side of histogram (side w/ larger values) extends farther out than left side. • Skewed to L: When left side of histogram extends much farther out than right side. • Modes • Bimodal Distribution: All up and down. 2 humps • Uniform Distribution: Equal across the board for the most part. • Numerical Summaries • IQR: [Tells you middle spread. Not influenced by extremes] • Mean: Add up data and divide by number of observations (Average). [Use for symmetric] [Moves towards the extreme value] • Median: Area to left of a pt equals area to the right of the pt. Middle value without reordering. [Use for skewed data] [Resistant to the extreme values] • Standard Deviation: Measure of spread • SD, Mean: Symmetric • IQR, Median: Skewed • Scatter plot: One axis represents each variable. Points plotted on graph • Response Variable: Measures/records outcome of a study. [on side axis] • Explanatory Variable: Explains changes in the response variable. [on bottom axis] • Association: Direction, form, strength • Form: Linear, Curved, Clusters, No Pattern • Direction: + (x goes up and y goes up), - (x goes up y goes down), no direction • Strength: How closely pts fit form • CORRELATION & REGRESSION • Correlation: Measures strength of the linear association between two quantitative variables. • Regression line: aka line of best fit. Goes thru mean of x’s and y’s. • Correlation Conditions (must be true in order to use correlation): o Quantitative: Only applies to quantitative variables o Linearity: Only applies to linear associations. o Outlier: When outlier is present, record correlation both with and without the pt • Correlation Properties: o Sign of Corr. Coefficient gives direction of the association. o Always between -1 and +1. If = to, means the data pts fall exactly on straight line. o X and Y are treated symmetrically. o Has no units. o Not affected by change in center/ scale o Sensitive to unusual observations. o Measures linear association between the 2 variables. • Residual: Difference between predicted y & observed y • Se: Standard deviation of residuals • r2: Fraction of the data’s variation accounted for by the model. • RANDOMNESS & PROBABILITY • Probability: Long-run relative frequency of an event. Relative Frequency is fraction so it can be a decimal or % • Event: Collection of outcomes. Denoted with bold capital letters (ex: A, B, C) • Joint Probabilities: Probability that two events both occur. • Marginal Probability: In a joint probability table, it is the probability distribution of either variable separately, usually found in the rightmost column or bottom row of table • Theoretical Probability: When it comes from a mathematical model (such as equally likely outcomes). • Empirical Probability: When it comes from the long-run relative frequency of the events’ occurrence. • Law of Large Numbers: Long-run relative frequency of repeated, independent events settles down to the true relative frequency as the number of trials increases. • Sample Space: Collection of all possible outcome values. Has probability of 1 • Tree Diagram (probability tree): Display of conditional events or probabilities that is helpful in thinking through conditioning.