Question 1

Which of the following is true of validation data?

Accepted Answer

A)  It is used as a last check that the regression model is complete. 
B)  It is a portion of the data that is used to build a regression model. 
C)  It is the portion of the data used to assess the regression model developed from the training data. 
D)  It provides a final estimate of the regression model's performance after it has been trained and validated. 
A)  It is used as a last check that the regression model is complete. 
B)  It is a portion of the data that is used to build a regression model. 
C)  It is the portion of the data used to assess the regression model developed from the training data. 
D)  It provides a final estimate of the regression model's performance after it has been trained and validated. C

Question 2

In feature selection, ________ begins by creating a separate regression model for each predictor.

Accepted Answer

A)  stepwise selection 
B)  forward selection 
C)  hold-out variation 
D)  dummy coding 
A)  stepwise selection 
B)  forward selection 
C)  hold-out variation 
D)  dummy coding B

Question 3

Describe the backward elimination, forward selection, and stepwise selection regression models of feature selection.

Accepted Answer

Backward elimination, forward selection, and stepwise selection are all methods used in regression analysis for feature selection.

Backward elimination starts with a model that includes all possible features and then systematically removes the least significant features one at a time until the model's performance stops improving. This method is useful for identifying the most important features in a dataset and can help simplify the model by removing irrelevant variables.

Forward selection, on the other hand, begins with a model that includes no features and then adds the most significant feature one at a time until adding more features does not improve the model's performance. This method is useful for identifying the best subset of features to include in the model and can help prevent overfitting by only adding relevant variables.

Stepwise selection is a combination of backward elimination and forward selection, where features are added or removed from the model at each step based on their significance. This method is more flexible and can be used to identify the best combination of features for the model.

Overall, these methods are used to identify the most relevant features for a regression model and can help improve the model's performance and interpretability. Each method has its own advantages and disadvantages, and the choice of method depends on the specific dataset and research question.

Question 4

The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are measures to determine the ________.

Accepted Answer

A)  quality of the statistical model 
B)  significance 
C)  overall model fit 
D)  coefficients 
A)  quality of the statistical model 
B)  significance 
C)  overall model fit 
D)  coefficients

Question 5

Compare and contrast simple bivariate linear regression with multiple linear regression.

Accepted Answer

Simple bivariate linear regression and m

Question 6

Which of the following is true of a predictive model?

Accepted Answer

A)  The interpretability of the x and y association is critical for this model to work. 
B)  An entire dataset is used to build a predictive model. 
C)  It is prospective-its main focus is on forecasting new data records. 
D)  It is retrospective-its main focus is on interpreting coefficients. 
A)  The interpretability of the x and y association is critical for this model to work. 
B)  An entire dataset is used to build a predictive model. 
C)  It is prospective-its main focus is on forecasting new data records. 
D)  It is retrospective-its main focus is on interpreting coefficients.

Question 7

Descriptive/explanatory modeling is used

Accepted Answer

A)  when the focus is limited to a single, numeric dependent variable and a single independent variable. 
B)  to represent explanation and association between independent and dependent variables. 
C)  to determine whether two or more independent variables are good predictors of the single dependent variable. 
D)  to predict a new observation. 
A)  when the focus is limited to a single, numeric dependent variable and a single independent variable. 
B)  to represent explanation and association between independent and dependent variables. 
C)  to determine whether two or more independent variables are good predictors of the single dependent variable. 
D)  to predict a new observation.

Question 8

In the ridesharing case study, the variable representing the likelihood of rain for a specific forecast period and location is ________.

Accepted Answer

A)  percent_Rain 
B)  precipProbability 
C)  percent_humidity 
D)  rainPossibility 
A)  percent_Rain 
B)  precipProbability 
C)  percent_humidity 
D)  rainPossibility

Question 9

Identify a true statement about the N-fold cross evaluation method of model validation.

Accepted Answer

A)  This procedure is highly sensitive to variation in the datasets. 
B)  This method requires minimal computer processing power and no training time. 
C)  This method uses randomly selected data. 
D)  This procedure typically uses ten data subsets. 
A)  This procedure is highly sensitive to variation in the datasets. 
B)  This method requires minimal computer processing power and no training time. 
C)  This method uses randomly selected data. 
D)  This procedure typically uses ten data subsets.

Question 10

In the N-fold cross validation model evaluation method, it is typical to use 45 folds (data subsets).

Accepted Answer

A) True 
 B)False

Question 11

When high levels of accuracy in a training dataset do not apply to predicting models using new data, the phenomenon is termed ________.

Accepted Answer

A)  adjacency 
B)  dummy coding 
C)  overfitting 
D)  multicollinearity 
A)  adjacency 
B)  dummy coding 
C)  overfitting 
D)  multicollinearity

Question 12

Which of the following datasets is used as an optional dataset dedicated for model validation?

Accepted Answer

A)  training data 
B)  validation data 
C)  baseline data 
D)  test data 
A)  training data 
B)  validation data 
C)  baseline data 
D)  test data

Question 13

KDNuggets identified ________ as one of the software tools most often used for data analysis.

Accepted Answer

A)  RapidMiner 
B)  PowerBI 
C)  Crystal Report 
D)  Tableau 
A)  RapidMiner 
B)  PowerBI 
C)  Crystal Report 
D)  Tableau

Question 14

In the ridesharing case study, the variable source refers to the ________.

Accepted Answer

A)  type of rideshare service 
B)  date and time of the ride 
C)  location of the ride pickup 
D)  ride unique id per observation 
A)  type of rideshare service 
B)  date and time of the ride 
C)  location of the ride pickup 
D)  ride unique id per observation

Question 15

Feature selection is a qualitative method used to reduce the impact of dummy coding.

Accepted Answer

A) True 
 B)False

Question 16

In feature selection, ________ follows forward selection by adding a variable at each stage, but also includes removing variables that no longer meet the threshold.

Accepted Answer

A)  hold-out variation 
B)  stepwise selection 
C)  dummy coding 
D)  forward selection 
A)  hold-out variation 
B)  stepwise selection 
C)  dummy coding 
D)  forward selection

Question 17

An essential practice before starting with any modeling process is to first ________.

Accepted Answer

A)  review and clean the dataset 
B)  determine the accuracy of the dataset 
C)  determine the target variables 
D)  plot the data into graphs 
A)  review and clean the dataset 
B)  determine the accuracy of the dataset 
C)  determine the target variables 
D)  plot the data into graphs

Question 18

In regression analysis, the variable being predicted is referred to as the ________.

Accepted Answer

A)  independent variable 
B)  target variable 
C)  predictor 
D)  feature 
A)  independent variable 
B)  target variable 
C)  predictor 
D)  feature

Question 19

In the ridesharing case study, the variable distance refers to ________.

Accepted Answer

A)  the number of miles a ride covered 
B)  the duration of the ride 
C)  the hour of day extracted from the datetime 
D)  how good the condition is overall 
A)  the number of miles a ride covered 
B)  the duration of the ride 
C)  the hour of day extracted from the datetime 
D)  how good the condition is overall

Question 20

In the context of modeling categorical values, dummy coding is ________.

Accepted Answer

A)  measuring the absolute difference between the predicted and actual values in a predictive model 
B)  representing the difference between the observed and predicted values of a dependent variable 
C)  typically dividing data into ten subsets called folds 
D)  creating a dichotomous value to represent a variable 
A)  measuring the absolute difference between the predicted and actual values in a predictive model 
B)  representing the difference between the observed and predicted values of a dependent variable 
C)  typically dividing data into ten subsets called folds 
D)  creating a dichotomous value to represent a variable

Which of the following is true of validation data?

In feature selection, ________ begins by creating a separate regression model for each predictor.

Describe the backward elimination, forward selection, and stepwise selection regression models of feature selection.

The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are measures to determine the ________.

Compare and contrast simple bivariate linear regression with multiple linear regression.

Which of the following is true of a predictive model?

Descriptive/explanatory modeling is used

In the ridesharing case study, the variable representing the likelihood of rain for a specific forecast period and location is ________.

Identify a true statement about the N-fold cross evaluation method of model validation.

In the N-fold cross validation model evaluation method, it is typical to use 45 folds (data subsets).

When high levels of accuracy in a training dataset do not apply to predicting models using new data, the phenomenon is termed ________.

Which of the following datasets is used as an optional dataset dedicated for model validation?

KDNuggets identified ________ as one of the software tools most often used for data analysis.

In the ridesharing case study, the variable source refers to the ________.

Feature selection is a qualitative method used to reduce the impact of dummy coding.

In feature selection, ________ follows forward selection by adding a variable at each stage, but also includes removing variables that no longer meet the threshold.

An essential practice before starting with any modeling process is to first ________.

In regression analysis, the variable being predicted is referred to as the ________.

In the ridesharing case study, the variable distance refers to ________.

In the context of modeling categorical values, dummy coding is ________.

Introduction to Marketing Analytics

Data Management

Exploratory Data Analysis Using Cognitive Analytics

Data Visualization

Neural Networks

Automated Machine Learning

Cluster Analysis

Market Basket Analysis

Natural Language Processing - Text Mining and Sentiment Analysis

Social Network Analysis

Web Analytics

Filters

Exam 5: Regression Analysis

Which of the following is true of validation data?

In feature selection, ________ begins by creating a separate regression model for each predictor.

Describe the backward elimination, forward selection, and stepwise selection regression models of feature selection.

The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are measures to determine the ________.

Compare and contrast simple bivariate linear regression with multiple linear regression.

Which of the following is true of a predictive model?

Descriptive/explanatory modeling is used

In the ridesharing case study, the variable representing the likelihood of rain for a specific forecast period and location is ________.

Identify a true statement about the N-fold cross evaluation method of model validation.

In the N-fold cross validation model evaluation method, it is typical to use 45 folds (data subsets).

When high levels of accuracy in a training dataset do not apply to predicting models using new data, the phenomenon is termed ________.

Which of the following datasets is used as an optional dataset dedicated for model validation?

KDNuggets identified ________ as one of the software tools most often used for data analysis.

In the ridesharing case study, the variable source refers to the ________.

Feature selection is a qualitative method used to reduce the impact of dummy coding.

In feature selection, ________ follows forward selection by adding a variable at each stage, but also includes removing variables that no longer meet the threshold.

An essential practice before starting with any modeling process is to first ________.

In regression analysis, the variable being predicted is referred to as the ________.

In the ridesharing case study, the variable distance refers to ________.

In the context of modeling categorical values, dummy coding is ________.

Introduction to Marketing Analytics

Data Management

Exploratory Data Analysis Using Cognitive Analytics

Data Visualization

Neural Networks

Automated Machine Learning

Cluster Analysis

Market Basket Analysis

Natural Language Processing - Text Mining and Sentiment Analysis

Social Network Analysis

Web Analytics

Filters