Deck 10: Data Mining
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/102
Play
Full screen (f)
Deck 10: Data Mining
1
Given the following confusion matrix
Predicted Group
What is the correct classification rate?
A)9/13 = 69%
B)10/14 = 86%
C)19/25 = 76%
D)6/19 = 32%
Predicted Group
What is the correct classification rate?
A)9/13 = 69%
B)10/14 = 86%
C)19/25 = 76%
D)6/19 = 32%
19/25 = 76%
2
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What percentage of the observations is classified correctly?
A)90%
B)80%
C)85%
D)100%
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What percentage of the observations is classified correctly?
A)90%
B)80%
C)85%
D)100%
85%
3
If using the regression tool for two-group discriminant analysis,in the regression dialog box,the Input Y-Range entry corresponds to
A)the Group values.
B)the independent variable values.
C)the predictor variable values.
D)b)and c)are both correct
A)the Group values.
B)the independent variable values.
C)the predictor variable values.
D)b)and c)are both correct
A
4
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the verbal test score value of the group centroid for group 1?
A)683.8
B)654.2
C)610.7
D)605.7
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the verbal test score value of the group centroid for group 1?
A)683.8
B)654.2
C)610.7
D)605.7
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
5
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.How many observations are classified correctly?
A)11
B)9
C)17
D)20
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.How many observations are classified correctly?
A)11
B)9
C)17
D)20
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
6
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the verbal test score value of the group centroid for group 2?
A)683.8
B)654.2
C)610.7
D)605.7
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the verbal test score value of the group centroid for group 2?
A)683.8
B)654.2
C)610.7
D)605.7
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
7
In discriminant analysis the averages for the independent variables for a group define the
A)centroid.
B)median.
C)mode.
D)central tendency.
A)centroid.
B)median.
C)mode.
D)central tendency.
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
8
The regression approach can be used in the two-group discriminant analysis problem because
A)the data are not normally distributed.
B)the R2 statistic is not very meaningful.
C)the regression equation can generate a discriminant score.
D)it scales to the k -group problem easily.
A)the data are not normally distributed.
B)the R2 statistic is not very meaningful.
C)the regression equation can generate a discriminant score.
D)it scales to the k -group problem easily.
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
9
Which of the following is not true regarding discriminant analysis?
A)The classification rule translates discriminant scores into group membership.
B)Discriminant analysis is based on discrete or categorical dependent variables.
C)The classification rule selected perfectly classifies the data used to derive the classification rule.
D)The confusion matrix summarizes classification results.
A)The classification rule translates discriminant scores into group membership.
B)Discriminant analysis is based on discrete or categorical dependent variables.
C)The classification rule selected perfectly classifies the data used to derive the classification rule.
D)The confusion matrix summarizes classification results.
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
10
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What number of observations is classified incorrectly?
A)3
B)9
C)17
D)20
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What number of observations is classified incorrectly?
A)3
B)9
C)17
D)20
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
11
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What percentage of the observations is classified incorrectly?
A)90%
B)80%
C)85%
D)15%
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What percentage of the observations is classified incorrectly?
A)90%
B)80%
C)85%
D)15%
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
12
The goal of discriminant analysis is
A)to develop a model to predict new dependent values.
B)the develop a rule for predicting to what group a new observation is most likely to belong.
C)to develop a rule for predicting how independent variable values predict dependent values.
D)none of these.
A)to develop a model to predict new dependent values.
B)the develop a rule for predicting to what group a new observation is most likely to belong.
C)to develop a rule for predicting how independent variable values predict dependent values.
D)none of these.
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
13
If using the regression tool for two-group discriminant analysis,in the regression dialog box,the Input X-Range entry corresponds to
A)the Group values.
B)the independent variable values.
C)the predicted variable values.
D)the fitted variable values.
A)the Group values.
B)the independent variable values.
C)the predicted variable values.
D)the fitted variable values.
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
14
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Suppose that for a given observation,the difference between Mahalanobis distances between group 1 and 2 G1-G2)is big and positive.This means that
A)The observation is likely to be classified correctly to group 2
B)The observation is likely to be classified correctly to group 1
C)The observation is likely to be classified incorrectly to group 2
D)The observation is likely to be classified incorrectly to group 1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Suppose that for a given observation,the difference between Mahalanobis distances between group 1 and 2 G1-G2)is big and positive.This means that
A)The observation is likely to be classified correctly to group 2
B)The observation is likely to be classified correctly to group 1
C)The observation is likely to be classified incorrectly to group 2
D)The observation is likely to be classified incorrectly to group 1
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
15
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Suppose that for a given observation,the difference between Mahalanobis distances between group 1 and 2 G1-G2)is big and negative.This means that
A)The observation is likely to be classified correctly to group 2
B)The observation is likely to be classified correctly to group 1
C)The observation is likely to be classified incorrectly to group 2
D)The observation is likely to be classified incorrectly to group 1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Suppose that for a given observation,the difference between Mahalanobis distances between group 1 and 2 G1-G2)is big and negative.This means that
A)The observation is likely to be classified correctly to group 2
B)The observation is likely to be classified correctly to group 1
C)The observation is likely to be classified incorrectly to group 2
D)The observation is likely to be classified incorrectly to group 1
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
16
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the quantitative test score value of the group centroid for group 2?
A)683.8
B)654.2
C)610.7
D)605.7
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the quantitative test score value of the group centroid for group 2?
A)683.8
B)654.2
C)610.7
D)605.7
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
17
Which of the following goodness-of-fit measures is used for discriminant analysis problems?
A)R2
B)multiple R2
C)adjusted R2
D)none of these
A)R2
B)multiple R2
C)adjusted R2
D)none of these
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
18
In a two-group discriminant analysis problem using regression,why is the midpoint cut-off value used to determine group classification?
A)Because the value minimizes the absolute misclassification error.
B)Because the value minimizes the probability of misclassification error.
C)Because the value represents an equal division between the groups.
D)Because the value incorporates problem specific knowledge.
A)Because the value minimizes the absolute misclassification error.
B)Because the value minimizes the probability of misclassification error.
C)Because the value represents an equal division between the groups.
D)Because the value incorporates problem specific knowledge.
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
19
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the quantitative test score value of the group centroid for group 1?
A)683.8
B)654.2
C)610.7
D)605.7
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the quantitative test score value of the group centroid for group 1?
A)683.8
B)654.2
C)610.7
D)605.7
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
20
Discriminant analysis DA)differs from most other predictive statistical methods because the dependent variable is
A)continuous
B)random
C)stochastic
D)discrete
A)continuous
B)random
C)stochastic
D)discrete
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
21
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the straight line distance between 6,4)and 2,9)?
A)3.20
B)6.40
C)9
D)41
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What is the straight line distance between 6,4)and 2,9)?
A)3.20
B)6.40
C)9
D)41
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
22
In preparation for mining an analyst should
A)explore the relationships between variables
B)verify completeness and accuracy of the data
C)clean the data addressing missing values,errors,etc.
D)all of the above
A)explore the relationships between variables
B)verify completeness and accuracy of the data
C)clean the data addressing missing values,errors,etc.
D)all of the above
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
23
An Excel add-in tool used for data mining is called
A)XL Miner
B)GS4
C)XML
D)Data Miner
A)XL Miner
B)GS4
C)XML
D)Data Miner
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
24
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What percentage of observations is classified correctly?
A)100%
B)85.71%
C)95%
D)90%
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What percentage of observations is classified correctly?
A)100%
B)85.71%
C)95%
D)90%
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
25
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.Based on the analysis presented in the spreadsheet,what percentage of the observations were correctly classified?
A)80%
B)85%
C)95%
D)100%
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.Based on the analysis presented in the spreadsheet,what percentage of the observations were correctly classified?
A)80%
B)85%
C)95%
D)100%
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
26
Mahalanobis distance is a measure of
A)a likelihood of correctly classifying an observation to a group
B)quantifying covariance between independent variables in the model
C)quantifying covariance between dependent variables in the model
D)quantifying correlations between independent variables in the model
A)a likelihood of correctly classifying an observation to a group
B)quantifying covariance between independent variables in the model
C)quantifying covariance between dependent variables in the model
D)quantifying correlations between independent variables in the model
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
27
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Suppose that for a given observation,the difference between Mahalanobis distances between group 1 and 2 G1-G2)is small.This means that
A)The observation is likely to be classified incorrectly
B)The observation is likely to be classified correctly
C)The observation is unlikely to be classified
D)The observation should be deleted from the data set
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Suppose that for a given observation,the difference between Mahalanobis distances between group 1 and 2 G1-G2)is small.This means that
A)The observation is likely to be classified incorrectly
B)The observation is likely to be classified correctly
C)The observation is unlikely to be classified
D)The observation should be deleted from the data set
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
28
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What percentage of observations is classified incorrectly?
A)5%
B)15%
C)95%
D)90%
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What percentage of observations is classified incorrectly?
A)5%
B)15%
C)95%
D)90%
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
29
Steps in the data mining process include the following in sequence)
A)identify opportunity),collect data),explore,understand and prepare data),identify tasks and tools,partition data),build and evaluate models),deploy models)
B)identify opportunity),collect data),explore,understand and prepare data),identify tasks and tools,build and evaluate models),deploy models)
C)collect data),explore,understand and prepare data),identify tasks and tools,partition data),build and evaluate models),deploy models)
D)identify opportunity),collect data),identify tasks and tools,partition data),build and evaluate models),deploy models)
A)identify opportunity),collect data),explore,understand and prepare data),identify tasks and tools,partition data),build and evaluate models),deploy models)
B)identify opportunity),collect data),explore,understand and prepare data),identify tasks and tools,build and evaluate models),deploy models)
C)collect data),explore,understand and prepare data),identify tasks and tools,partition data),build and evaluate models),deploy models)
D)identify opportunity),collect data),identify tasks and tools,partition data),build and evaluate models),deploy models)
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
30
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What formula is entered in cell E4 and copied to cells E5:E25 of the spreadsheet?
A)= 7.452402 + 0.00694*C4 + 0.00232*D4
B)= 7.452402 ? 0.00694*C4 ? 0.00232*D4
C)= 1.157926 + 0.001545*C4 + 0.01297*D4
D)= 7.452402 ? 0.00694*D4 ? 0.00232*C4
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.What formula is entered in cell E4 and copied to cells E5:E25 of the spreadsheet?
A)= 7.452402 + 0.00694*C4 + 0.00232*D4
B)= 7.452402 ? 0.00694*C4 ? 0.00232*D4
C)= 1.157926 + 0.001545*C4 + 0.01297*D4
D)= 7.452402 ? 0.00694*D4 ? 0.00232*C4
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
31
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What is the quantitative test score value of the group centroid for group 1?
A)697.71
B)647.86
C)587.67
D)650.43
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What is the quantitative test score value of the group centroid for group 1?
A)697.71
B)647.86
C)587.67
D)650.43
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
32
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What is the verbal test score value of the group centroid for group 1?
A)697.71
B)647.86
C)587.67
D)650.43
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What is the verbal test score value of the group centroid for group 1?
A)697.71
B)647.86
C)587.67
D)650.43
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
33
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What number of observations is classified incorrectly?
A)19
B)20
C)7
D)1
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What number of observations is classified incorrectly?
A)19
B)20
C)7
D)1
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
34
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What is the verbal test score value of the group centroid for group 3?
A)697.71
B)647.86
C)587.67
D)605.17
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What is the verbal test score value of the group centroid for group 3?
A)697.71
B)647.86
C)587.67
D)605.17
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
35
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.The university has received applications from several new students and would like to predict which group they would fall into.What is the discriminant score for a student with a Quantitative score of 686 and a Verbal score of 601.Use five 5)significant figures in your coefficients.
A)1.29 ? discriminant score ? 1.30
B)1.69 ? discriminant score ? 1.70
C)2.69 ? discriminant score ? 2.70
D)6.05 ? discriminant score ? 6.06
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.The university has received applications from several new students and would like to predict which group they would fall into.What is the discriminant score for a student with a Quantitative score of 686 and a Verbal score of 601.Use five 5)significant figures in your coefficients.
A)1.29 ? discriminant score ? 1.30
B)1.69 ? discriminant score ? 1.70
C)2.69 ? discriminant score ? 2.70
D)6.05 ? discriminant score ? 6.06
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
36
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What is the quantitative test score value of the group centroid for group 2?
A)697.71
B)647.86
C)587.67
D)650.43
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What is the quantitative test score value of the group centroid for group 2?
A)697.71
B)647.86
C)587.67
D)650.43
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
37
Exhibit 10.2
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).
Discriminant Analysis Report
October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances
-Refer to Exhibit 10.2.What number of observations is classified correctly?
A)19
B)20
C)7
D)8
The following questions are based on the problem description,spreadsheet,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful Group 1),marginally successful Group 2)or not- successful Group 3)in their graduate studies.The officer has data on 20 current students,7 successful Group 1),6 marginally successful Group 2)and 7 not successful Group 3).

October 1,2013 4:22:38 PM
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Training Sample Classification
Mahalanobis Distances

-Refer to Exhibit 10.2.What number of observations is classified correctly?
A)19
B)20
C)7
D)8
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
38
Suppose that there are 3 variables in a data set.Approximately how many data records are required using a rule of thumb discussed in the textbook?
A)30 to 45
B)20 to 30
C)45 to 60
D)50 to 100
A)30 to 45
B)20 to 30
C)45 to 60
D)50 to 100
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
39
Normalization of data involves
A)expressing each variable on a common,standardized scale
B)subrtracting the grand mean from each observation
C)dividing each observation by total variance
D)dividing each observation by average range
A)expressing each variable on a common,standardized scale
B)subrtracting the grand mean from each observation
C)dividing each observation by total variance
D)dividing each observation by average range
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
40
Exhibit 10.1
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).
Unpooled Estimates of within-group Covariance matrices are used,assuming they are different.
Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Based on the regression output,what is the discriminant score for a student with a quantitative score of 635 and a verbal score of 570?
A)1.72 ? discriminant score ? 1.73
B)2.02 ? discriminant score ? 2.03
C)3.04 ? discriminant score ? 3.05
D)6.12 ? discriminant score ? 6.14
The following questions are based on the problem description,regression results,and the Analytic Solver Platform Discriminant Analysis report below.
A college admissions officer wants to evaluate graduate school applicants based on their GMAT scores,verbal and quantitative.Students are classified as either successful or not-successful in their graduate studies.The officer has data on 20 current students,ten of whom are doing very well Group 1)and ten who are not Group 2).


Group Centroids
Group Quantitative Verbal
1 683.8 654.2
2 610.7 605.7

-Refer to Exhibit 10.1.Based on the regression output,what is the discriminant score for a student with a quantitative score of 635 and a verbal score of 570?
A)1.72 ? discriminant score ? 1.73
B)2.02 ? discriminant score ? 2.03
C)3.04 ? discriminant score ? 3.05
D)6.12 ? discriminant score ? 6.14
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
41
A way to detecting and avoiding overfitting is to
A)use the validation sample to calibrate the model
B)use repeated runs of the model and averaging the results
C)use computer simulation
D)use rigorous statistical tools
A)use the validation sample to calibrate the model
B)use repeated runs of the model and averaging the results
C)use computer simulation
D)use rigorous statistical tools
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
42
Techniques)used in prediction step of data mining include
A)regression analysis
B)the k'th largest neighbor technique
C)neural networks
D)all of the above
A)regression analysis
B)the k'th largest neighbor technique
C)neural networks
D)all of the above
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
43
Data mining tasks fall in the following categories
A)classification,prediction,association
B)categorization,segmentation
C)prediction,association,mining
D)observation,categorization,association
A)classification,prediction,association
B)categorization,segmentation
C)prediction,association,mining
D)observation,categorization,association
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
44
Suppose that a data set contains a variable EDUCATION,which has 7 discrete levels.EDUCATION can be represented by binary variables
A)6
B)7
C)8
D)9
A)6
B)7
C)8
D)9
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
45
Plots useful in data mining analysis can be accessed in Excel using the add-in
A)XLMiner
B)Charts
C)Data Analysis
D)Vusual Basic
A)XLMiner
B)Charts
C)Data Analysis
D)Vusual Basic
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
46
In the step of data mining,a researcher attempts to estimate to which discrete group an observation belongs to
A)classification
B)prediction
C)categorization
D)association/segmentation
A)classification
B)prediction
C)categorization
D)association/segmentation
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
47
Suppose that a data set contains a variable EDUCATION,which has 7 discrete levels.EDUCATION is an example of
A)a categorical variable
B)a classification variable
C)a continuous variable
D)an exponential variable
A)a categorical variable
B)a classification variable
C)a continuous variable
D)an exponential variable
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
48
Techniques)used in classification step of data mining include
A)discriminant analysis
B)logistic regression
C)neural networks
D)all of the above
A)discriminant analysis
B)logistic regression
C)neural networks
D)all of the above
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
49
A test sample is often used to perform of how well the model will work with new data
A)an honest assessment
B)an estimate
C)a guess
D)a belief
A)an honest assessment
B)an estimate
C)a guess
D)a belief
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
50
Oversampling forces a classification method to
A)discriminate between groups
B)classifying records correctly
C)classifying records incorrectly
D)perform a large number of iterations
A)discriminate between groups
B)classifying records correctly
C)classifying records incorrectly
D)perform a large number of iterations
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
51
Useful data mining techniques can be found in Excel under drop menu
A)Data/Data Analysis)
B)Regression
C)Histogram
D)Insert/Chart
A)Data/Data Analysis)
B)Regression
C)Histogram
D)Insert/Chart
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
52
One element in cleaning the data set in the mining process involves
A)removing unimportant variables
B)adding more variables to the data set
C)calculating the adjusted R2
D)calculating the coefficient of multiple correlation
A)removing unimportant variables
B)adding more variables to the data set
C)calculating the adjusted R2
D)calculating the coefficient of multiple correlation
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
53
In the step of data mining,a researcher attempts to predict the value of a continuous response variable based on the data set
A)classification
B)prediction
C)categorization
D)association/segmentation
A)classification
B)prediction
C)categorization
D)association/segmentation
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
54
Suppose that the correlation coefficient between X1 and X2 is equal to 1.This means that
A)X1 and X2 are perfectly positively correlated
B)X1 and X2 are perfectly negatively correlated
C)X1 and X2 are weakly and positively correlated
D)X1 and X2 are weakly and negatively correlated
A)X1 and X2 are perfectly positively correlated
B)X1 and X2 are perfectly negatively correlated
C)X1 and X2 are weakly and positively correlated
D)X1 and X2 are weakly and negatively correlated
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
55
In the step of data mining,a researcher attempts to form logical groupings of data in the set
A)classification
B)prediction
C)categorization
D)association/segmentation
A)classification
B)prediction
C)categorization
D)association/segmentation
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
56
A correlation coefficient
A)measures the strength of a linear association between two variables
B)measures the strength of statistical association between two variables
C)measures the strength of physical association between two variables
D)measures the strength of logical association between two variables
A)measures the strength of a linear association between two variables
B)measures the strength of statistical association between two variables
C)measures the strength of physical association between two variables
D)measures the strength of logical association between two variables
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
57
In classification techniques the dependent variable is
A)discrete
B)continuous
C)bounded from above
D)bounded from below
A)discrete
B)continuous
C)bounded from above
D)bounded from below
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
58
Techniques)used in association step of data mining include
A)affinity analysis
B)binary analysis
C)integer networks
D)all of the above
A)affinity analysis
B)binary analysis
C)integer networks
D)all of the above
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
59
Overfitting refers to
A)placing too much emphasis on the sample-specific noise
B)fitting the model too tightly
C)fitting the model too loosely
D)underestimating model parameters
A)placing too much emphasis on the sample-specific noise
B)fitting the model too tightly
C)fitting the model too loosely
D)underestimating model parameters
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
60
Suppose that two variables are found to be significantly correlated.A researcher may
A)remove one variable from the data set
B)replace the two variables by their product
C)replace the two variables by their squared difference
D)remove both variables from the data set
A)remove one variable from the data set
B)replace the two variables by their product
C)replace the two variables by their squared difference
D)remove both variables from the data set
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
61
The first step in creating a classification tree involves
A)recursively partitioning the independent variables using the outcome of previous partitions
B)creating a subset of variables
C)clustering the variables into a larger superset
D)deciding the number of partitions
A)recursively partitioning the independent variables using the outcome of previous partitions
B)creating a subset of variables
C)clustering the variables into a larger superset
D)deciding the number of partitions
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
62
To create the training and validation data set for the model use the option in the XLMiner tab
A)partition with oversampling
B)partition without oversampling
C)partition
D)sample
A)partition with oversampling
B)partition without oversampling
C)partition
D)sample
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
63
Before effectively applying the k nearest neighbor classification technique,the variables need to be
A)standardized
B)normalized
C)randomized
D)trimmed
A)standardized
B)normalized
C)randomized
D)trimmed
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
64
_ is a classification technique that estimates the probability of an observation belonging to a particular group
A)logistic regression
B)binary regression
C)multivariate analysis
D)ANCOVA
A)logistic regression
B)binary regression
C)multivariate analysis
D)ANCOVA
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
65
The Fisher classification scores can be converted to
A)a linear function for each of the groups in the classification problem
B)probabilities of group membership
C)a uniform distribution
D)a half-space
A)a linear function for each of the groups in the classification problem
B)probabilities of group membership
C)a uniform distribution
D)a half-space
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
66
The parameters of the logistic regression model
A)are derived through a nonlinear maximum likelihood estimation procedure
B)are negative
C)are positive
D)are negative fractions
A)are derived through a nonlinear maximum likelihood estimation procedure
B)are negative
C)are positive
D)are negative fractions
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
67
In the k nearest neighbor technique,a small value of k produces classifications that are
A)very sensitive to the sample-specific characteristics of the training data
B)not sensitive to the sample-specific characteristics of the training data
C)robust
D)reliable
A)very sensitive to the sample-specific characteristics of the training data
B)not sensitive to the sample-specific characteristics of the training data
C)robust
D)reliable
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
68
Suppose the Fisher classification scores for an observation have been converted to the following probabilities: i)0.6 for Group 1 and ii)0.4 for Group 2.The observation will be classified to
A)Group 1
B)Group 2
C)Group 1 or Group 2 with equal probabilities
D)neither Group 1 nor Group 2
A)Group 1
B)Group 2
C)Group 1 or Group 2 with equal probabilities
D)neither Group 1 nor Group 2
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
69
In the k nearest neighbor technique,a large value of k produces classifications that
A)are very sensitive to the sample-specific characteristics of the training data
B)place all observations into the most frequently occurring group in the training data
C)are robust
D)are reliable
A)are very sensitive to the sample-specific characteristics of the training data
B)place all observations into the most frequently occurring group in the training data
C)are robust
D)are reliable
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
70
_ and must be chosen each time a partition is subdivided
A)an independent variable and splitting value
B)a dependent variable and cutoff value
C)a significance level and an upper bound
D)a significance level and a lower bound
A)an independent variable and splitting value
B)a dependent variable and cutoff value
C)a significance level and an upper bound
D)a significance level and a lower bound
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
71
Suppose that the correlation coefficient between X1 and X2 is equal to -1.This means that
A)X1 and X2 are perfectly positively correlated
B)X1 and X2 are perfectly negatively correlated
C)X1 and X2 are weakly and positively correlated
D)X1 and X2 are weakly and negatively correlated
A)X1 and X2 are perfectly positively correlated
B)X1 and X2 are perfectly negatively correlated
C)X1 and X2 are weakly and positively correlated
D)X1 and X2 are weakly and negatively correlated
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
72
The k-nearest neighbor classification technique
A)identifies the k observations in the training data that are most similar to a new observation we want to classify
B)sorts the data in an increasing order
C)sorts the data in a decreasing order
D)works very much like the ranking algorithm
A)identifies the k observations in the training data that are most similar to a new observation we want to classify
B)sorts the data in an increasing order
C)sorts the data in a decreasing order
D)works very much like the ranking algorithm
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
73
Logistic regression is a classification technique that
A)outperforms other techniques acros a variety of data collections
B)is not reliable
C)is not robust
D)is not feasible for most data sets
A)outperforms other techniques acros a variety of data collections
B)is not reliable
C)is not robust
D)is not feasible for most data sets
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
74
The Fisher linear discriminant function
A)identifies a linear function for each of the groups in the classification problem
B)fits a nonlinear function for each of the groups in the classification problem
C)defines a hyperplane
D)defines a half-space
A)identifies a linear function for each of the groups in the classification problem
B)fits a nonlinear function for each of the groups in the classification problem
C)defines a hyperplane
D)defines a half-space
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
75
A provides a visual summary of the improvements that a data mining project provides on a binary classification problem compared to a random guess
A)lift chart
B)Pareto chart
C)Gantt chart
D)histogram
A)lift chart
B)Pareto chart
C)Gantt chart
D)histogram
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
76
Standardization of a variable
A)removes the scale factor from consideration
B)requires subtracting the mean and dividing the difference by the standard deviation
C)results in all variables having the mean of 0 and standard deviation of 1
D)all of the above
A)removes the scale factor from consideration
B)requires subtracting the mean and dividing the difference by the standard deviation
C)results in all variables having the mean of 0 and standard deviation of 1
D)all of the above
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
77
The objective of classification tree algorithms is to
A)minimize the average weighted impurity of the resulting partitions
B)maximize the average weighted impurity of the resulting partitions
C)maximize the average weighted purity of the resulting partitions
D)maximize the likelihood estimator of the resulting partitions
A)minimize the average weighted impurity of the resulting partitions
B)maximize the average weighted impurity of the resulting partitions
C)maximize the average weighted purity of the resulting partitions
D)maximize the likelihood estimator of the resulting partitions
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
78
Two common ways of measuring impurity are and _
A)the Gini index and the entropy measure
B)the fraction of pure data and the enthaply index
C)the fraction of contaminated data and the entropy index
D)the fraction of real data and the data purity coefficient
A)the Gini index and the entropy measure
B)the fraction of pure data and the enthaply index
C)the fraction of contaminated data and the entropy index
D)the fraction of real data and the data purity coefficient
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
79
Logistic regression in XLMiner add-in can be used for groups
A)2
B)3
C)4
D)5
A)2
B)3
C)4
D)5
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck
80
A graphical representation of a set of rules for classifying observations into 2 or more groups is called
A)a classification tree
B)a binary tree
C)a Pareto diagram
D)a branch-and-bound tree
A)a classification tree
B)a binary tree
C)a Pareto diagram
D)a branch-and-bound tree
Unlock Deck
Unlock for access to all 102 flashcards in this deck.
Unlock Deck
k this deck