An economist is analysing the incomes of professionals (physicians, dentists and lawyers). He realises that an important factor is the number of years of experience. However, he wants to know if there are differences among the three professional groups. He takes a random sample of 125 professionals and estimates the multiple regression model: $y = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 3 } + \varepsilon$ . where y = annual income (in $1000). $x _ { 1 }$ = years of experience. $x _ { 2 }$ = 1 if physician. = 0 if not. $X _ { 3 }$ = 1 if dentist. = 0 if not. The computer output is shown below. THE REGRESSION EQUATION IS $y =$ $71.65 + 2.07 x _ { 1 } + 10.16 x _ { 2 } - 7.44 x _ { 3 }$ . \[\begin{array} { | c | r r r | } \hline \text { Predictor } & \text { Coef } & \text { SDev } & T \\ \hline \text { Constant } & 71.65 & 18.56 & 3.860 \\ x _ { 1 } & 2.07 & 0.81 & 2.556 \\ x _ { 2 } & 10.16 & 3.16 & 3.215 \\ x _ { 3 } & - 7.44 & 2.85 & - 2.611 \\ \hline \end{array}\] S = 42.6 R-Sq = 30.9%. \[\begin{array}{l} \text { ANALYSIS OF VARIANCE }\\ \begin{array} { | l | r c c c | } \hline \text { Source of Variation } & d f & \text { SS } & M S & F \\ \hline \text { Regression } & 3 & 98008 & 32669.333 & 18.008 \\ \text { Error } & 121 & 219508 & 1814.116 & \\ \hline \text { Total } & 124 & 317516 & & \\ \hline \end{array} \end{array}\] Do these results allow us to conclude at the 1% significance level that the model is useful in predicting the income of professionals?

@#LAT-DLM& . @#LAT-DLM& At least one @#LAT-DLM& is not equal

An economist is in the process of developing a model to predict the price of gold. She believes that the two most important variables are the price of a barrel of oil $\left( x _ { 1 } \right)$ and the interest rate $\left( x _ { 2 } \right)$ She proposes the first-order model with interaction: $y = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 1 } x _ { 3 } + \varepsilon$ . A random sample of 20 daily observations was taken. The computer output is shown below. THE REGRESSION EQUATION IS \[y = 115.6 + 22.3 x _ { 1 } + 14.7 x _ { 2 } - 1.36 x _ { 1 } x _ { 2 } \text {. }\] $\begin{array}{|c|rrc|} \hline \text { Predictor } & \text { Coef } & \text { SDDev } & T \\ \hline \text { Constant } & 115.6 & 78.1 & 1.480 \\ x_{1} & 22.3 & 7.1 & 3.141 \\ x_{2} & 14.7 & 6.3 & 2.333 \\ x_{1} x_{2} & -1.36 & 0.52 & -2.615 \\ \hline \end{array}$ \[\mathrm { S } = 20.9 \quad \mathrm { R } - \mathrm { Sq } = 55.4 \%\] ANALYSIS OF VARIANCE $\begin{array}{|l|rrrc|} \hline \text { Source of Variation } & \text { df } & \text { SS } & \text { MS } & F \\ \hline \text { Regression } & 3 & 8661 & 2887.0 & 6.626 \\ \text { Error } & 16 & 6971 & 435.7 & \\ \hline \text { Total } & 19 & 15632 & & \\ \hline \end{array}$ Is there sufficient evidence at the 1% significance level to conclude that the price of a barrel of oil and the price of gold are linearly related?

@#LAT-DLM& . @#LAT-DLM& @#LAT-DLM& . Rejection region: |t| >

Exam 20: Model Building

A professor of accounting wanted to develop a multiple regression model to predict the students' grades in her fourth-year accounting course. She decides that the two most important factors are the student's grade point average (GPA) in the first three years and the student's major. She proposes the model: $y = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 3 } + \varepsilon$ . where y = fourth-year accounting course mark (out of 100). $x _ { 1 }$ = GPA in first three years (range 0 to 12). $x _ { 2 }$ = 1 if student's major is accounting. = 0 if not. $X _ { 3 }$ = 1 if student's major is finance. = 0 if not. The computer output is shown below. THE REGRESSION EQUATION IS $y =$ $9.14 + 6.73 x _ { 1 } + 10.42 x _ { 2 } + 5.16 x _ { 3 }$ . Predictor Coef SDev T Constant 9.14 7.10 1.287 6.73 1.91 3.524 10.42 4.16 2.505 5.16 3.93 1.313 S = 15.0 R-Sq = 44.2%. ANALYSIS OF VARIANCE Source of Variation df SS MS F Regression 3 17098 5699.333 25.386 Error 96 21553 224.510 Total 99 38651 Do these results allow us to conclude at the 1% significance level that the model is useful in predicting the fourth-year accounting course mark?

Free

(Essay)

5.0/5

(22)

Question 1

Correct Answer:

Verified

$H _ { 0 } : \beta _ { 1 } = \beta _ { 2 } = \beta _ { 3 } = 0$ . $H _ { 1 } :$ At least one $\beta _ { i }$ is not equal to zero.
Rejection region: F > $F _ { 0.013,96 } \approx$ 3.95.
Test statistic: F = 25.386.
Conclusion: Reject the null hypothesis. Yes.

In the first-order regression model ŷ = 12 + 6x₁ +8x₂ + 4x₁x₂, a unit increase in x₁ increases the value of $y$ on average by 6 units.

Free

(True/False)

4.9/5

(33)

Question 2

Correct Answer:

Verified

False

In the first-order model = 60 + 40x₁ -10x₂ + 5x₁x₂, a unit increase in x₁, while holding x₂ constant at 1, increases the value of $y$ on average by 45 units.

Free

(True/False)

4.8/5

(35)

Question 3

Correct Answer:

Verified

True

We interpret the coefficients in a multiple regression model by holding all variables in the model constant.

(True/False)

4.8/5

(37)

Question 4

Stepwise regression is especially useful when there are many independent variables.

(True/False)

4.8/5

(32)

Question 5

The graph of the model $The graph of the model = \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 } is shaped like a straight line going upwards.$ $= \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 }$ is shaped like a straight line going upwards.

(True/False)

4.9/5

(38)

Question 6

An economist is analysing the incomes of professionals (physicians, dentists and lawyers). He realises that an important factor is the number of years of experience. However, he wants to know if there are differences among the three professional groups. He takes a random sample of 125 professionals and estimates the multiple regression model: $y = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 3 } + \varepsilon$ . where y = annual income (in $1000). $x _ { 1 }$ = years of experience. $x _ { 2 }$ = 1 if physician. = 0 if not. $X _ { 3 }$ = 1 if dentist. = 0 if not. The computer output is shown below. THE REGRESSION EQUATION IS $y =$ $71.65 + 2.07 x _ { 1 } + 10.16 x _ { 2 } - 7.44 x _ { 3 }$ . Predictor Coef SDev T Constant 71.65 18.56 3.860 2.07 0.81 2.556 10.16 3.16 3.215 -7.44 2.85 -2.611 S = 42.6 R-Sq = 30.9%. ANALYSIS OF VARIANCE Source of Variation df SS MS F Regression 3 98008 32669.333 18.008 Error 121 219508 1814.116 Total 124 317516 Do these results allow us to conclude at the 1% significance level that the model is useful in predicting the income of professionals?

(Essay)

4.8/5

(42)

Question 7

A traffic consultant has analysed the factors that affect the number of traffic fatalities. She has come to the conclusion that two important variables are the number of cars and the number of tractor-trailer trucks. She proposed the second-order model with interaction: $y =$ $\beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 1 } ^ { 2 } + \beta _ { 4 } x _ { 2 } ^ { 2 } + \beta _ { 5 } x _ { 1 } x _ { 2 } + \varepsilon$ . Where: y = number of annual fatalities per shire. $x _ { 1 }$ = number of cars registered in the shire (in units of 10 000). $x _ { 2 }$ = number of trucks registered in the shire (in units of 1000). The computer output (based on a random sample of 35 shires) is shown below. THE REGRESSION EQUATION IS $y =$ $69.7 + 11.3 x _ { 1 } + 7.61 x _ { 2 } - 1.15 x _ { 1 } ^ { 2 } - 0.51 x _ { 2 } ^ { 2 } - 0.13 x _ { 1 } x _ { 2 }$ . Predictor Coef SDev T Constant 69.7 41.3 1.688 11.3 5.1 2.216 7.61 2.55 2.984 -1.15 0.64 -1.797 -0.51 0.20 -2.55 -0.13 0.10 -1.30 S = 15.2 R-Sq = 47.2%. ANALYSIS OF VARIANCE Source of df SS MS F Variation Regression 5 5959 1191.800 5.181 Error 29 6671 230.034 Total 34 12630 Test at the 1% significance level to determine whether the $x _ { 2 } ^ { 2 }$ term should be retained in the model.

(Essay)

4.7/5

(36)

Question 8

Which of the following is another name for a dummy variable?

(Multiple Choice)

5.0/5

(36)

Question 9

The model $The model = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } is used whenever the statistician believes that, on average, y is linearly related to x _ { 1 } and x _ { 2 } , and the predictor variables do not interact.$ $= \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 }$ is used whenever the statistician believes that, on average, $y$ is linearly related to $x _ { 1 }$ and $x _ { 2 }$ , and the predictor variables do not interact.

(True/False)

4.7/5

(30)

Question 10

A first-order model was used in a regression analysis involving 25 observations to study the relationship between a dependent variable y and three independent variables, $x _ { 1 }$ , $x _ { 2 }$ and $X _ { 3 }$ . The analysis showed that the mean squares for regression is 160 and the sum of squares for error is 1050. In addition, the following is a partial computer printout. Predictor Coef StDev Constant 25 4 18 6 -12 4.8 6 5 Is there enough evidence at the 5% significance level to conclude that the model is useful in predicting the value of y?

(Essay)

4.8/5

(43)

Question 11

A traffic consultant has analysed the factors that affect the number of traffic fatalities. She has come to the conclusion that two important variables are the number of cars and the number of tractor-trailer trucks. She proposed the second-order model with interaction: $y =$ $\beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 1 } ^ { 2 } + \beta _ { 4 } x _ { 2 } ^ { 2 } + \beta _ { 5 } x _ { 1 } x _ { 2 } + \varepsilon$ . Where: y = number of annual fatalities per shire. $x _ { 1 }$ = number of cars registered in the shire (in units of 10 000). $x _ { 2 }$ = number of trucks registered in the shire (in units of 1000). The computer output (based on a random sample of 35 shires) is shown below. THE REGRESSION EQUATION IS $y =$ $69.7 + 11.3 x _ { 1 } + 7.61 x _ { 2 } - 1.15 x _ { 1 } ^ { 2 } - 0.51 x _ { 2 } ^ { 2 } - 0.13 x _ { 1 } x _ { 2 }$ . Predictor Coef SDev T Constant 69.7 41.3 1.688 11.3 5.1 2.216 7.61 2.55 2.984 -1.15 0.64 -1.797 -0.51 0.20 -2.55 -0.13 0.10 -1.30 S = 15.2 R-Sq = 47.2%. ANALYSIS OF VARIANCE Source of df SS MS F Variation Regression 5 5959 1191.800 5.181 Error 29 6671 230.034 Total 34 12630 Is there enough evidence at the 5% significance level to conclude that the model is useful in predicting the number of fatalities?

(Essay)

4.7/5

(29)

Question 12

Suppose that the sample regression line of a first-order model is $Suppose that the sample regression line of a first-order model is = 4 + 3 x _ { 1 } + 2 x _ { 2 } . If we examine the relationship between y and x _ { 1 } for three different values of x _ { 2 } , we observe that the effect of x _ { 1 } on y remains the same no matter what the value of x _ { 2 } .$ $= 4 + 3 x _ { 1 } + 2 x _ { 2 }$ . If we examine the relationship between y and $x _ { 1 }$ for three different values of $x _ { 2 }$ , we observe that the effect of $x _ { 1 }$ on $y$ remains the same no matter what the value of $x _ { 2 }$ .

(True/False)

4.8/5

(44)

Question 13

An avid football fan was in the process of examining the factors that determine the success or failure of football teams. He noticed that teams with many rookies and teams with many veterans seem to do quite poorly. To further analyse his beliefs, he took a random sample of 20 teams and proposed a second-order model with one independent variable. The selected model is: $y = \beta _ { 0 } + \beta _ { 1 } x + \beta _ { 2 } x ^ { 2 } + \varepsilon$ . where y = winning team's percentage. x = average years of professional experience. The computer output is shown below: THE REGRESSION EQUATION IS: $y =$ $32.6 + 5.96 x - 0.48 x ^ { 2 }$ Predictor Coef S2Dev T Constant 32.6 19.3 1.689 x 5.96 2.41 2.473 -0.48 0.22 -2.182 S = 16.1 R-Sq = 43.9%. ANALYSIS OF VARIANCE Source of Variation df SS MS F Regression 2 3452 1726 6.663 Error 17 4404 259.059 Total 19 7856 Test to determine at the 10% significance level whether the $x ^ { 2 }$ term should be retained.

(Essay)

4.8/5

(32)

Question 14

In general, to represent a nominal independent variable that has n possible categories, we would create n dummy variables.

(True/False)

4.9/5

(35)

Question 15

In regression analysis, indicator variables may be used as independent variables.

(True/False)

4.9/5

(43)

Question 16

Suppose that the sample regression equation of a model is $Suppose that the sample regression equation of a model is = 4.7 + 2.2 x _ { 1 } + 2.6 x _ { 2 } - x _ { 1 } x _ { 2 } . If we examine the relationship between y and x _ { 2 } for x _ { 1 } = 1, 2 and 3, we observe that the three equations produced not only differ in the intercept term, but the coefficient of x _ { 2 } also varies.$ $= 4.7 + 2.2 x _ { 1 } + 2.6 x _ { 2 } - x _ { 1 } x _ { 2 }$ . If we examine the relationship between y and $x _ { 2 }$ for $x _ { 1 }$ = 1, 2 and 3, we observe that the three equations produced not only differ in the intercept term, but the coefficient of $x _ { 2 }$ also varies.

(True/False)

4.8/5

(32)

Question 17

In the first-order model $In the first-order model = 75 - 12 x _ { 1 } + 5 x _ { 2 } - 3 x _ { 1 } x _ { 2 } , a unit increase in x _ { 1 } , while holding x _ { 2 } constant at a value of 2, decreases the value of y on average by 8 units.$ $= 75 - 12 x _ { 1 } + 5 x _ { 2 } - 3 x _ { 1 } x _ { 2 }$ , a unit increase in $x _ { 1 }$ , while holding $x _ { 2 }$ constant at a value of 2, decreases the value of $y$ on average by 8 units.

(True/False)

4.8/5

(37)

Question 18

Suppose that the estimated regression equation for 200 business graduates is ŷ = 20 000 + 2000x + 1500I, Where y is the starting salary, x is the grade point average and I is an indicator variable that takes the value of 1 if the student is a computer information systems major and 0 if not. A business administration major graduate with a grade point average of 4 would have an average starting salary of:

(Multiple Choice)

4.7/5

(45)

Question 19

An economist is in the process of developing a model to predict the price of gold. She believes that the two most important variables are the price of a barrel of oil $\left( x _ { 1 } \right)$ and the interest rate $\left( x _ { 2 } \right)$ She proposes the first-order model with interaction: $y = \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 } + \beta _ { 3 } x _ { 1 } x _ { 3 } + \varepsilon$ . A random sample of 20 daily observations was taken. The computer output is shown below. THE REGRESSION EQUATION IS $y = 115.6 + 22.3 x _ { 1 } + 14.7 x _ { 2 } - 1.36 x _ { 1 } x _ { 2 } \text {. }$ Predictor Coef SDDev T Constant 115.6 78.1 1.480 22.3 7.1 3.141 14.7 6.3 2.333 -1.36 0.52 -2.615 $\mathrm { S } = 20.9 \quad \mathrm { R } - \mathrm { Sq } = 55.4 \%$ ANALYSIS OF VARIANCE Source of Variation df SS MS F Regression 3 8661 2887.0 6.626 Error 16 6971 435.7 Total 19 15632 Is there sufficient evidence at the 1% significance level to conclude that the price of a barrel of oil and the price of gold are linearly related?

(Essay)

4.8/5

(24)

Question 20

In the first-order regression model ŷ = 12 + 6x₁ +8x₂ + 4x₁x₂, a unit increase in x₁ increases the value of $y$ on average by 6 units.

In the first-order model = 60 + 40x₁ -10x₂ + 5x₁x₂, a unit increase in x₁, while holding x₂ constant at 1, increases the value of $y$ on average by 45 units.

We interpret the coefficients in a multiple regression model by holding all variables in the model constant.

Stepwise regression is especially useful when there are many independent variables.

The graph of the model $The graph of the model = \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 } is shaped like a straight line going upwards.$ $= \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 }$ is shaped like a straight line going upwards.

Which of the following is another name for a dummy variable?

In general, to represent a nominal independent variable that has n possible categories, we would create n dummy variables.

In regression analysis, indicator variables may be used as independent variables.

What Is Statistics

Types of Data, Data Collection and Sampling

Graphical Descriptive Methods Nominal Data

Graphical Descriptive Techniques Numerical Data

Numerical Descriptive Measures

Probability

Random Variables and Discrete Probability Distributions

Continuous Probability Distributions

Statistical Inference: Introduction

Sampling Distributions

Estimation: Describing a Single Population

Estimation: Comparing Two Populations

Hypothesis Testing: Describing a Single Population

Hypothesis Testing: Comparing Two Populations

Inference About Population Variances

Analysis of Variance

Additional Tests for Nominal Data: Chi-Squared Tests

Simple Linear Regression and Correlation

Multiple Regression

Nonparametric Techniques

Statistical Inference: Conclusion

Time-Series Analysis and Forecasting

Index Numbers

Decision Analysis

Filters

Exam 20: Model Building

In the first-order regression model ŷ = 12 + 6x1 +8x2 + 4x1x2, a unit increase in x1 increases the value of yyy on average by 6 units.

In the first-order model = 60 + 40x1 -10x2 + 5x1x2, a unit increase in x1, while holding x2 constant at 1, increases the value of yyy on average by 45 units.

We interpret the coefficients in a multiple regression model by holding all variables in the model constant.

Stepwise regression is especially useful when there are many independent variables.

The graph of the model =β0+β1xi+β2xi2= \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 }=β0​+β1​xi​+β2​xi2​ is shaped like a straight line going upwards.

Which of the following is another name for a dummy variable?

The model =β0+β1x1+β2x2= \beta _ { 0 } + \beta _ { 1 } x _ { 1 } + \beta _ { 2 } x _ { 2 }=β0​+β1​x1​+β2​x2​ is used whenever the statistician believes that, on average, yyy is linearly related to x1x _ { 1 }x1​ and x2x _ { 2 }x2​ , and the predictor variables do not interact.

In general, to represent a nominal independent variable that has n possible categories, we would create n dummy variables.

In regression analysis, indicator variables may be used as independent variables.

In the first-order model =75−12x1+5x2−3x1x2= 75 - 12 x _ { 1 } + 5 x _ { 2 } - 3 x _ { 1 } x _ { 2 }=75−12x1​+5x2​−3x1​x2​ , a unit increase in x1x _ { 1 }x1​ , while holding x2x _ { 2 }x2​ constant at a value of 2, decreases the value of yyy on average by 8 units.

What Is Statistics

Types of Data, Data Collection and Sampling

Graphical Descriptive Methods Nominal Data

Graphical Descriptive Techniques Numerical Data

Numerical Descriptive Measures

Probability

Random Variables and Discrete Probability Distributions

Continuous Probability Distributions

Statistical Inference: Introduction

Sampling Distributions

Estimation: Describing a Single Population

Estimation: Comparing Two Populations

Hypothesis Testing: Describing a Single Population

Hypothesis Testing: Comparing Two Populations

Inference About Population Variances

Analysis of Variance

Additional Tests for Nominal Data: Chi-Squared Tests

Simple Linear Regression and Correlation

Multiple Regression

Nonparametric Techniques

Statistical Inference: Conclusion

Time-Series Analysis and Forecasting

Index Numbers

Decision Analysis

Filters

In the first-order regression model ŷ = 12 + 6x₁ +8x₂ + 4x₁x₂, a unit increase in x₁ increases the value of $y$ on average by 6 units.

In the first-order model = 60 + 40x₁ -10x₂ + 5x₁x₂, a unit increase in x₁, while holding x₂ constant at 1, increases the value of $y$ on average by 45 units.

The graph of the model $The graph of the model = \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 } is shaped like a straight line going upwards.$ $= \beta _ { 0 } + \beta _ { 1 } x _ { i } + \beta _ { 2 } x _ { i } ^ { 2 }$ is shaped like a straight line going upwards.