Question 1

______________ is NOT a step of Data Mining Process.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Model construction&#10;D)Supervised learning

Accepted Answer

Supervised learning

Question 2

Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of

A)classification of a categorical outcome.
B)estimation of a continuous outcome.
C)prediction of a categorical outcome.
D)unsupervised learning.

Accepted Answer

estimation of a continuous outcome.

Question 3

___________ is dividing the sample data into three sets for training, validation, and testing of the data-mining algorithm performance.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

Data partitioning

Question 4

__________is one minus the Class 0 error rate.&#10;A)Sensitivity&#10;B)Specificity&#10;C)Accuracy&#10;D)Cutoff value

Accepted Answer

The answer of __________is one minus the Class 0 error...

Question 5

Estimation methods are also referred to as&#10;A)prediction methods.&#10;B)clustering methods.&#10;C)association methods.&#10;D)supervised methods.

Accepted Answer

The answer of Estimation methods are also referred to as&#10;A)prediction...

Question 6

A characteristic or quantity of interest that can take on different values is a(n)&#10;A)variable.&#10;B)observation.&#10;C)record.&#10;D)quality.

Accepted Answer

The answer of A characteristic or quantity of interest that...

Question 7

A(n)_______________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.&#10;A)record&#10;B)data point&#10;C)classification&#10;D)location

Accepted Answer

The answer of A(n)_______________ is often displayed as a row...

Question 8

____________ is a category of data-mining techniques in which an algorithm learns how to predict or classify an outcome variable of interest.&#10;A)Supervised Learning&#10;B)Unsupervised Learning&#10;C)Dimension Reduction&#10;D)Data Sampling

Accepted Answer

The answer of ____________ is a category of data-mining techniques...

Question 9

Data used to build a data mining model is called&#10;A)validation data.&#10;B)training data.&#10;C)test data.&#10;D)exploration data.

Accepted Answer

The answer of Data used to build a data mining...

Question 10

____________is a method of extracting data relevant to the business problem under consideration. It is the first step in the Data Mining process.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Model construction&#10;D)Model assessment

Accepted Answer

The answer of ____________is a method of extracting data relevant...

Question 11

As we increase the cutoff value, _______ error will decrease. And_________error will rise.&#10;A)Class 0, Class 1&#10;B)Class 1, Class 0&#10;C)error, accuracy&#10;D)false, true

Accepted Answer

The answer of As we increase the cutoff value, _______...

Question 12

____________is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

The answer of ____________is the manipulation of the data with...

Question 13

Data-mining methods for predicting an outcome based on a set of input variables is referred to as&#10;A)supervised learning.&#10;B)unsupervised learning.&#10;C)dimension reduction.&#10;D)data sampling.

Accepted Answer

The answer of Data-mining methods for predicting an outcome based...

Question 14

Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)&#10;A)overall error rate.&#10;B)error.&#10;C)accuracy.&#10;D)class.

Accepted Answer

The answer of Classifying a record as belonging to one...

Question 15

Misclassifying an actual ______ observation as a(n) ______ observation is known as a false positive.&#10;A)Class 0, Class 1&#10;B)Class 1, Class 0&#10;C)error, accuracy&#10;D)false, true

Accepted Answer

The answer of Misclassifying an actual ______ observation as a(n)...

Question 16

The set of recorded values of variables associated with a single entity is a(n)&#10;A)observation.&#10;B)data point.&#10;C)classification.&#10;D)location.

Accepted Answer

The answer of The set of recorded values of variables...

Question 17

_____________is the step in data-mining which includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

The answer of _____________is the step in data-mining which includes...

Question 18

______________ involves descriptive statistics, data visualization, and clustering.&#10;A)Data exploration&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

The answer of ______________ involves descriptive statistics, data visualization, and...

Question 19

Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of&#10;A)data exploration.&#10;B)data partitioning.&#10;C)data preparation.&#10;D)model assessment.

Accepted Answer

The answer of Applying descriptive statistics and data visualization to...

Question 20

The percent of misclassified records out of the total records in the validation data is known as the&#10;A)overall error rate.&#10;B)error.&#10;C)accuracy.&#10;D)class.

Accepted Answer

The answer of The percent of misclassified records out of...

Question 21

___________ is a generalization of linear regression for predicting a categorical outcome variable.&#10;A)Multiple linear regression&#10;B)Logistic regression&#10;C)Discriminant analysis&#10;D)Cluster analysis

Accepted Answer

The answer of ___________ is a generalization of linear regression...

Question 22

Separate error rates with respect to the false negative and false positive cases are computed to take into account the&#10;A)asymmetric costs in misclassification.&#10;B)symmetric weights of these two cases.&#10;C)distortions due to outliers.&#10;D)effect of sampling error.

Accepted Answer

The answer of Separate error rates with respect to the...

Question 23

The X axis of a lift chart shows&#10;A)number of actual Class 1 records identified.&#10;B)ratio of decile mean to overall mean.&#10;C)the number of actual Class 1 records.&#10;D)the ratio of the overall mean to the decile mean.

Accepted Answer

The answer of The X axis of a lift chart...

Question 24

Given the following classification confusion matrix, what is the accuracy?&#10;&#8203;   &#8203;&#10;&#8203;

Accepted Answer

The answer of Given the following classification confusion matrix, what...

Question 25

Given the following classification confusion matrix, what is the overall error rate?   &#8203;&#10;&#8203;&#10;&#8203;

Accepted Answer

The answer of Given the following classification confusion matrix, what...

Question 26

A _____ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.&#10;A)regression tree&#10;B)scatter chart&#10;C)classification tree&#10;D)classification confusion matrix

Accepted Answer

The answer of A _____ classifies a categorical outcome variable...

Question 27

The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for&#10;A)regression trees.&#10;B)time-series plots.&#10;C)classification trees.&#10;D)cumulative lift charts.

Accepted Answer

The answer of The impurity of a group of observations...

Question 28

_______compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly selected.&#10;A)Cumulative lift&#10;B)&#8203;Classification confusion&#10;C)Decile-wise lift chart&#10;D)ROC curve

Accepted Answer

The answer of _______compares the number of actual Class 1...

Question 29

Which of the following is a commonly used supervised learning method?&#10;A)k-means clustering&#10;B)k-nearest neighbors&#10;C)hierarchical clustering&#10;D)association rule development

Accepted Answer

The answer of Which of the following is a commonly...

Question 30

_____ is a measure of the heterogeneity of observations in a classification tree.&#10;A)Sensitivity&#10;B)Specificity&#10;C)Accuracy&#10;D)Impurity

Accepted Answer

The answer of _____ is a measure of the heterogeneity...

Question 31

One minus the overall error rate is often referred to as the _____ of the model.&#10;A)sensitivity&#10;B)accuracy&#10;C)specificity&#10;D)cutoff value

Accepted Answer

The answer of One minus the overall error rate is...

Question 32

How many Class 1's are correctly classified as Class 1 in the Table below? $\begin{array} { | l | c | c | } \hline { \text { Classification Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & 1 & 0 \\\hline 1 & 221 & 100 \\\hline 0 & 30 & 3,000 \\\hline\end{array}$ ?

A)221
B)100
C)30
D)3,000

Accepted Answer

The answer of How many Class 1's are correctly classified...

Question 33

An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)&#10;A)false negative.&#10;B)false positive.&#10;C)residual.&#10;D)outlier.

Accepted Answer

The answer of An observation classified as part of a...

Question 34

The Y axis of a decile chart shows&#10;A)number of important class records identified.&#10;B)ratio of decile mean to overall mean.&#10;C)the number of actual Class 1 records.&#10;D)the ratio of the overall mean to the decile mean.

Accepted Answer

The answer of The Y axis of a decile chart...

Question 35

In the k-nearest neighbors method, when the value of k is set to 1&#10;A)the classification or prediction of a new observation is based solely on the single most similar observation from the training set.&#10;B)the new observation's class is na&#239;vely assigned to the most common class in the training set.&#10;C)the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.&#10;D)the classification or prediction of a new observation is subject to the smallest possible classification error.

Accepted Answer

The answer of In the k-nearest neighbors method, when the...

Question 36

_______ attempts to classify a categorical outcome as a linear function of explanatory variables.&#10;A)Linear regression&#10;B)Logistic regression&#10;C)Classification model&#10;D)Supervised learning

Accepted Answer

The answer of _______ attempts to classify a categorical outcome...

Question 37

How many Class 1's are incorrectly classified as Class 0? $\begin{array} { | l | c | c | } \hline { \text { Classification Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & 1 & 0 \\\hline 1 & 221 & 100 \\\hline 0 & 30 & 3,000 \\\hline\end{array}$ ?

A)221
B)100
C)30
D)3,000

Accepted Answer

The answer of How many Class 1's are incorrectly classified...

Question 38

_____ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.&#10;A)Underfitting&#10;B)Overfitting&#10;C)Oversampling&#10;D)Undersampling

Accepted Answer

The answer of _____ refers to the scenario in which...

Question 39

A(n) __________ matrix displays a model's correct and incorrect classification.&#10;A)cumulative lift&#10;B)classification confusion&#10;C)decile-wise lift chart&#10;D)ROC curve

Accepted Answer

The answer of A(n) __________ matrix displays a model's correct...

Question 40

Test set is the data set used to&#10;A)build the data mining model.&#10;B)estimate accuracy of candidate models on unseen data.&#10;C)estimate accuracy of final model on unseen data.&#10;D)show counts of actual versus predicted class values.

Accepted Answer

The answer of Test set is the data set used...

Deck 9: Predictive Data Mining