Exam 8: Model Selection in Multiple Linear Regression Analysis
Inclusion of irrelevant variables is a potential problem because
C
When would you use the RESET test? What is the null hypothesis for the test? What is the intuition for why it works? Explain.
The RESET test is used to test for the inclusion of higher order polynomials such as x2 and x3.To perform the RESET test you first estimate the original regression model and then obtain the predicted values squared,predicted values cubed,and predicted values raised to the 4th power.You then regress y on the original model and the three additional functions of the predicted values.The null hypothesis for the test is that the coefficients on the higher order polynomials are equal to 0 and the polynomials do not belong in the regression.Perform an F-test if all the new parameters are jointly equal to 0.The intuition behind this test is that higher order functions of the predicted values contain higher order functions of the independent variables and therefore if the predicted values raised to the 2,3,and 4 are statistically significant then higher order functions of the predicted values are needed.
What is the potential shortcoming of using a strict cutoff to determine statistical significance? Explain.
A strict cutoff leads to potentially very different interpretations of which estimates are statistically significant,marginally significant,and statistically insignificant for variables for which the estimated p-values are actually very similar.For instance,if one variable has a p-value of .099 and another has a p-value of .101,we would conclude that the first is marginally significant while the second is statistically insignificant while in reality the difference between these two p-values is likely insignificant.
Suppose you are a potential college student that is interested in determining whether it is worthwhile to declare a certain major.In an effort to find the answer,you collect data on 1,247 college who majored in HUMANITIES,SCIENCE,ENGINEERING,or ENGLISH and you estimate the sample regression function (standard errors in parentheses) log\nobreakspacear= 13.81+ 0.03+ 0.02+ 0.01+ 0.08+ 0.02 (2.97) (0.0005) (0.00003) (0.02) (0.025) (0.001)
a)Do you think omitted variable bias is a potential problem in this case? Why? Explain.
b)What is the problem associated with omitted variable bias? Explain.
c)How might you control for the potential omitted variable bias in this case? Explain.
Suppose you are interested in estimating how the test scores of elementary schools are related to average class size,parents education level (in years),and percent of English learners at the school.To do so,you collect a sample of 200 California public schools and specify the following model:
Suppose you are concerned that the proposed model needs higher order polynomials.
a)Initially you believe that class size is the only variable that should be entered as a higher order polynomial.Propose a new model and explain how you would test for this possibility.
b)After thinking about it,you believe that all of the independent variables may need to be entered in as a higher order polynomial.What type of test would you perform? Describe the steps you would take to implement this test.
c)Now instead you believe that the model estimated above is not appropriate and a better model would be
How would you test between the initial model and this model? Be as specific as possible.
When would you use the Davidson-MacKinnon test? What is the null hypothesis for the test? What is the intuition for why it works? Explain.
If you had to either include an irrelevant variable or omit a relevant variable,you would prefer
Why is missing data a potential problem? What are two ways to deal with it? Which approach do you prefer and why?
Suppose that you are performing the Davidson-MacKinnon test for choosing among non-nested alternatives and that in the second stage you estimate the sample regression function and find that the predicted values are not statistically significant.You decide that
Suppose that you estimate the sample regression function
Employing the "eye test" you might suspect that the marginal effect of
What is an outlier? Why are they a potential problem? What can you do to deal with outliers? Why does this approach work? Explain.
What is the inclusion of an irrelevant variable? Why is it a problem? How do you try to prevent it? Explain.
Suppose that you are performing the RESET test for the inclusion of higher-order polynomials and that in the second stage you estimate the sample regression function and the predicted value terms are statistically significant.You decide that
Suppose you are interested in the factors that explain GDP all over the world.You gather data for the most recent year on GDP,Education level,Household Consumption,and Number of People Employed for every country in the world and you specify the following model. (GDP)= + Education Level + Household Consumption + Num People Employed +\varepsilon
a)You are able to find data for GDP,Household Consumption,Number of People Employed for every country but Education Level is missing for 40% of the countries.What are your options when estimating this model?
b)What type of countries do you think have missing data for Education Level? Why?
c)If you ran a regression with only the 60% of countries that don't have missing values,do you think your results would be different than if you had been able to gather data on all countries? Why or why not?
Suppose that you estimate the sample regression function
You might be concerned that
Is a biased estimate of the marginal effect of experience on salary because
Filters
- Essay(0)
- Multiple Choice(0)
- Short Answer(0)
- True False(0)
- Matching(0)