Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Understanding Minimum Error Points & Statistical Significance in Linear Regression, Assignments of Software Engineering

Baylor University (BU)Software Engineering

A lecture transcript from baylor university's csi 5v93: introduction to machine learning course, focusing on linear regression and hypothesis testing. The lecture covers the concept of minimum error points, the properties of estimate β, and statistical hypothesis testing for βj = 0. It also discusses the differences between the t and gaussian distributions and provides an example of testing the hypothesis using matlab.

Typology: Assignments

Pre 2010

Uploaded on 08/18/2009

koofers-user-qbs 🇺🇸

10 documents

1 / 11

Related documents

Understanding Statistical Significance: Errors & Confidence Intervals in Logistics

Understanding Statistical Significance, Correlation, and Regression in MKT Research

Time Series - Regression and Anova - Exam

Linear Regression: Theory and Practice of Linear Least Squares - Prof. Carleton Detar

Understanding Statistical Hypothesis Testing: Significance, Power, and Errors

Perfect Multicollinearity - Applied Regression Analysis - Lecture Slides

P-value approach, standard error, median, linear regression equation, significance test

Understanding Correlation: Interpolation, Scatterplots, and Linear Regression

Autocorrelation Problem - Applied Regression Analysis - Lecture Slides

Understanding Statistical Significance, Effect Size, and Statistical Power in Research

Understanding Hypotheses & Statistical Significance in Inferential Stats

Statistical Properties of OLS Estimators in Linear Regression

Understanding Coefficients, Significance, and Forecasting in Regression and Time Series

Understanding Hypothesis Testing: Null & Alternative Hypotheses, Errors, Significance - Pr

Understanding Response-Covariate Relationship in Regression Modeling

Understanding Two-Variable Linear Model & t-Tests: Inferential Stats & Regression - Prof.

Linear Regression Two - Political Science - Lecture Slides

Understanding Hypothesis Tests: Two-Tailed Tests, Significance Levels, and Errors

Understanding Linear Regression: Finding the Best Fit Line - Prof. M. Gamel

(1)

Estimated Regression - Applied Regression Analysis - Lecture Slides

(1)

MATH 242 Lecture 22: Finding Critical Points and Regression Analysis - Prof. Michael Price

Statistical Analysis of Linear Regression Data - Prof. Spencer Muse

Autocorrelation - Applied Regression Analysis - Lecture Slides

Overall Significance of Equation - Applied Regression Analysis - Lecture Slides

Regression Analysis Table: Evidence of Linear Relationship

Simple Linear Regression - Basic Statistics for Behavioral Sciences - Lecture Notes

Regression Analysis in Excel: Understanding Simple Linear Regression

Statistical Analysis of Nitrogen Dioxide Concentration Data with Linear Regression

STAT 212 Module 11 Reading Notes: Linear Regression and ANOVA - Prof. Kenneth Strazzeri

Exercise 5: Regression Analysis - Fitting a Linear Regression Model and Hypothesis Testing

Partial preview of the text

Download Understanding Minimum Error Points & Statistical Significance in Linear Regression and more Assignments Software Engineering in PDF only on Docsity! Lecture 6: Linear regression and hypothesis testing CSI 5v93: Introduction to machine learning Baylor University Computer Science Department Dr. Greg Hamerly http://cs.baylor.edu/˜hamerly/ CSI 5v93: Introduction to machine learning, Lecture 6 – p. 1/22 Announcements • Homework 2 due February 8th – extension CSI 5v93: Introduction to machine learning, Lecture 6 – p. 2/22 Questions? CSI 5v93: Introduction to machine learning, Lecture 6 – p. 3/22 Chapter 3: Linear methods for regression • 3.1 – Introduction • 3.2 – Linear regression models and least squares • 3.3 – Multiple regression from simple univariate regression • 3.4 – Subset selection and coefficient shrinkage • 3.5 – Computational considerations CSI 5v93: Introduction to machine learning, Lecture 6 – p. 4/22 The hypothesis test for βj = 0 Hypotheses: • H0 (null hypothesis): βj = 0 • H1 (alternative hypothesis): βj 6= 0 The test statistic is: zj = β̂j σ̂ √ vj Where vj is the jth diagonal element of (XT X)−1. Under H0, zj ∼ tN−d−1. The t distribution is like the Gaussian distribution, but with fatter tails. CSI 5v93: Introduction to machine learning, Lecture 6 – p. 9/22 The t and Gaussian distributions Z Ta il P ro ba bi lit ie s 2.0 2.2 2.4 2.6 2.8 3.0 0. 01 0. 02 0. 03 0. 04 0. 05 0. 06 The two are very related, but the t distribution arises because we don’t know the true σ2 of the data, only an estimate of it. Note that t requires the number of samples, Gaussian does not. The test is then if the zj score is large enough that it falls outside of the acceptance region, and lies in the rejection region. CSI 5v93: Introduction to machine learning, Lecture 6 – p. 10/22 Example: Testing the hypothesis • True model: f(X) = 30 + 50X − 10X2 • Assumed model: g(X) = β0 + β1X + β2X2 + β3 √ X + • Generate noisy data: yi = f(xi) + • Compute β̂ = (XT X)−1XT y (under assumed model g(X)) Question: Is β̂3 significant? Applying the test: • H0: β̂3 = 0 • H1: β̂3 6= 0 • Compute z3 = β̂3/(σ̂ √ v3) • Under H0, z3 ∼ tN−d−1 • Set α = 0.05 • If Pr(Z > z3) < α, then reject H0 CSI 5v93: Introduction to machine learning, Lecture 6 – p. 11/22 Matlab example -500 -400 -300 -200 -100 0 100 200 0 2 4 6 8 10 noisy data learned model • True model: f(X) = 30 + 50X − 10X2 • Assumed model: g(X) = β0 + β1X + β2X2 + β3 √ X + • β̂ = [ 38.5 57.8 −10.3 −16.8 ] • z = [ 6.4 12.3 −47.8 −1.6 ] • Pr(|Z| > 1.9721) = 1 − 0.05 = 0.95 • Therefore, β̂0, β̂1, and β̂2 are all significant (by themselves), and β̂3 is NOT significant (by itself). CSI 5v93: Introduction to machine learning, Lecture 6 – p. 12/22 Eliminating multiple variables Note that this z-test for significance only applies to one variable, and not multiple variables at once. To eliminate multiple variables, eliminate one variable at a time, re-running the test each time. CSI 5v93: Introduction to machine learning, Lecture 6 – p. 13/22 Eliminating multiple variables The F-test allows multiple variables to be tested for significance. F = (RSS0 − RSS1)/(d1 − d0) RSS1/(n − d1 − 1) • RSS0 and d0 refer to the smaller model (with d0 + 1 parameters) • RSS1 and d1 refer to the larger model (with d1 + 1 parameters) Then we know that F is distributed as Fd1−d0,n−d1−1 distribution. CSI 5v93: Introduction to machine learning, Lecture 6 – p. 14/22 Choosing the subset The RSS will always be smallest for the model with the most parameters (all d parameters), so this is not very useful. RSS for different subsets for the cancer-prediction problem: Subset Size k R es id ua l S um -o f-S qu ar es 0 20 40 60 80 10 0 0 1 2 3 4 5 6 7 8 • • • •• • •• •• ••• •• • ••• •••• ••• •• •••• •••• •• ••• •••• •••• ••• •• ••••• •••• ••• •••• •••• • ••• • ••••• ••• ••• •• ••• • •• •••• •• • •• •• ••••• • • CSI 5v93: Introduction to machine learning, Lecture 6 – p. 19/22 Stepwise selection Rather than consider every possible subset, stepwise selection adds one variable at a time. Forward stepwise selection: • start with one parameter (the intercept) • add the next best parameter (using the F statistic to determine the best) • repeat until the change in the F statistic is not significant (with some probability, e.g. 95%) Backwards stepwise selection is similar, but works in reverse. CSI 5v93: Introduction to machine learning, Lecture 6 – p. 20/22 Issues with stepwise selection What are some drawbacks with stepwise selection? Is forward or backward easier? Do you expect they will produce the same results? CSI 5v93: Introduction to machine learning, Lecture 6 – p. 21/22 2-minute journal Please write a response to the following on a piece of paper and hand it in immediately. Please make it anonymous (no names). Write about: • major points you learned today • areas not understood or requiring clarification CSI 5v93: Introduction to machine learning, Lecture 6 – p. 22/22

Documents

questions