Question 1

Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis.​

Accepted Answer

A) ​most similar 
B) ​most different 
C) ​farthest apart 
D) ​closest 
A) ​most similar 
B) ​most different 
C) ​farthest apart 
D) ​closest

Question 2

The __________ the lift ratio, the ____________ the association rule.

Accepted Answer

A) ​higher; stronger 
B) ​higher; weaker 
C) lower; stronger 
D) ​lower; weaker 
A) ​higher; stronger 
B) ​higher; weaker 
C) lower; stronger 
D) ​lower; weaker

Question 3

Using the data given, apply k-means clustering using Wait time (min) as the variable with k = 3. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then create one distinct data set for each of the three resulting clusters for waiting time.
a. For the observations composing the cluster which has the low waiting time, apply hierarchical clustering with Ward's method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster.
b. For the observations composing the cluster which has the medium waiting time, apply hierarchical clustering with Ward's method to form three clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster.
c. For the observations composing the cluster which has the high waiting time, apply hierarchical clustering with Ward's method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster.

Accepted Answer

​
​
Below is the Pivot table on the data

Question 4

________________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.​

Accepted Answer

A) ​Ward's method 
B) ​Jaccard's coefficient 
C) ​McQuitty's method 
D) ​None of these. 
A) ​Ward's method 
B) ​Jaccard's coefficient 
C) ​McQuitty's method 
D) ​None of these.

Question 5

In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling?

Accepted Answer

A) Data sampling 
B) Data preparation 
C) Model construction 
D) Model assessment 
A) Data sampling 
B) Data preparation 
C) Model construction 
D) Model assessment

Question 6

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

Accepted Answer

A) ​matching coefficient. 
B) Jaccard's coefficient. 
C) Euclidean distance. 
D) ​antecedent. 
A) ​matching coefficient. 
B) Jaccard's coefficient. 
C) Euclidean distance. 
D) ​antecedent.

Question 7

__________________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

Accepted Answer

A) Single linkage 
B) Complete linkage 
C) Average linkage 
D) Average group linkage 
A) Single linkage 
B) Complete linkage 
C) Average linkage 
D) Average group linkage

Question 8

To identify patterns across transactions, we can use

Accepted Answer

A) association rules. 
B) ​complete linkage. 
C) centroid linkage. 
D) k-means. 
A) association rules. 
B) ​complete linkage. 
C) centroid linkage. 
D) k-means.

Question 9

Using the data given, apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Use Ward's method as the clustering method.
a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the clusters in the
hierarchical clustering.
b. Identify the cluster with the largest average price. Using all the variables, how would you characterize this cluster?
c. Identify the smallest cluster.

Accepted Answer

​
a. Below is the PivotTable obtained on

Question 10

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

Accepted Answer

A) 0.5 
B) 1 
C) 1.5 
D) 2 
A) 0.5 
B) 1 
C) 1.5 
D) 2

Question 11

Euclidean distance can be used to measure the distance between________________ in cluster analysis.

Accepted Answer

A) objects 
B) clusters 
C) observations 
D) ward 
A) objects 
B) clusters 
C) observations 
D) ward

Question 12

Using the data given, apply k-means clustering using Price ($) as the variable with k = 3. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then create one distinct data set for each of the three resulting clusters of price.
a. For the observations composing the cluster with low home price, apply hierarchical clustering with Ward's method to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each cluster.
b. For the observations composing the cluster with medium home price, apply hierarchical clustering with Ward's method to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each cluster.
c. Comment on the cluster with high home price.

Accepted Answer

Below is the Pivot table on the data in

Question 13

Platinum Gym has 10,000 gyms members out of which 1500 memberships included Unlimited Fitness Training and use of the tanning salon, and out of which 750 included Unlimited Hydromassage. If the Fitness Training are considered A, the use of the tanning salon are considered B, and the Hydromassage are considered C, then the associate rule for these sales become, "If A and B are purchased, then C is also purchased." Given total transactions for C is 3000. Calculate the lift for this rule.

Accepted Answer

The answer of Platinum Gym has 10,000 gyms members out...

Question 14

Data preparation includes all of the following except which task?

Accepted Answer

A) calculating the confidence ratio for all association rules 
B) treating missing data 
C) identifying erroneous data and outliers 
D) defining the appropriate way to represent variables 
A) calculating the confidence ratio for all association rules 
B) treating missing data 
C) identifying erroneous data and outliers 
D) defining the appropriate way to represent variables

Question 15

k-means clustering is the process of

Accepted Answer

A) agglomerating observations into a series of nested groups based on a measure of similarity. 
B) organizing observations into distinct groups based on a measure of similarity. 
C) reducing the number of variables to consider in data-mining. 
D) estimating the value of a continuous outcome variable. 
A) agglomerating observations into a series of nested groups based on a measure of similarity. 
B) organizing observations into distinct groups based on a measure of similarity. 
C) reducing the number of variables to consider in data-mining. 
D) estimating the value of a continuous outcome variable.

Question 16

Which statement is true of an association rule?​

Accepted Answer

A) ​It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. 
B) ​It is a data reduction technique that reduces large information into smaller homogeneous groups. 
C) ​It uses analytic models to describe the relationship between metrics that drive business performance. 
D) ​It seeks to classify a categorical outcome into two or more categories. 
A) ​It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. 
B) ​It is a data reduction technique that reduces large information into smaller homogeneous groups. 
C) ​It uses analytic models to describe the relationship between metrics that drive business performance. 
D) ​It seeks to classify a categorical outcome into two or more categories.

Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis.

The the lift ratio, the __ the association rule.

________________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.

In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling?

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

__________________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

To identify patterns across transactions, we can use

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

Euclidean distance can be used to measure the distance between________________ in cluster analysis.

Data preparation includes all of the following except which task?

k-means clustering is the process of

Which statement is true of an association rule?

Introduction

Descriptive Statistics

Data Visualization

Probability: an Introduction to Modeling Uncertainty

Statistical Inference

Linear Regression

Time Series Analysis and Forecasting

Predictive Data Mining

Spreadsheet Models

Linear Optimization Models

Integer Linear Optimization Models

Nonlinear Optimization Models

Monte Carlo Simulation

Decision Analysis

Filters

Exam 4: Descriptive Data Mining

Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis.​

The __________ the lift ratio, the ____________ the association rule.

________________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.​

In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling?

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

__________________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

To identify patterns across transactions, we can use

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

Euclidean distance can be used to measure the distance between________________ in cluster analysis.

Data preparation includes all of the following except which task?

k-means clustering is the process of

Which statement is true of an association rule?​

Introduction

Descriptive Statistics

Data Visualization

Probability: an Introduction to Modeling Uncertainty

Statistical Inference

Linear Regression

Time Series Analysis and Forecasting

Predictive Data Mining

Spreadsheet Models

Linear Optimization Models

Integer Linear Optimization Models

Nonlinear Optimization Models

Monte Carlo Simulation

Decision Analysis

Filters

Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis.

The the lift ratio, the __ the association rule.

________________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.

Which statement is true of an association rule?