Question 1

The process of eliminating variables from formal analysis without losing any crucial information is called

Accepted Answer

A) dimension reduction. 
B) data sampling. 
C) data reduction. 
D) aggregation. 
A) dimension reduction. 
B) data sampling. 
C) data reduction. 
D) aggregation.

Question 2

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?​

Accepted Answer

A) ​the short leg 
B) ​the long leg 
C) ​the hypotenuse 
D) ​Eudlidean distance is not related to right triangles. 
A) ​the short leg 
B) ​the long leg 
C) ​the hypotenuse 
D) ​Eudlidean distance is not related to right triangles.

Question 3

The strength of the association rule is known as ____________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

Accepted Answer

A) ​lift 
B) ​antecedent 
C) support count 
D) consequent 
A) ​lift 
B) ​antecedent 
C) support count 
D) consequent

Question 4

In preparing categorical variables for analysis, it is usually best to​

Accepted Answer

A) convert the categories to numeric representations. 
B) convert the categories to binary, dummy variables. 
C) combine as many categories as possible. 
D) let them remain categorical. 
A) convert the categories to numeric representations. 
B) convert the categories to binary, dummy variables. 
C) combine as many categories as possible. 
D) let them remain categorical.

Question 5

Using the data given, apply hierarchical clustering with 5 clusters using Wait Time (min), Purchase Amount ($),
Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the
XLMiner Hierarchical Clustering procedure. Use Ward's method as the clustering method.
a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the five clusters
in the hierarchical clustering.
b. Identify the cluster with the largest average waiting time. Using all the variables, how would you characterize
this cluster?
c. Identify the smallest cluster.
d. By examining the dendrogram on the HC_Dendrogram worksheet (as well as the sequence of clustering stages
in HC_Output1), what number of clusters seems to be the most natural fit based on the distance?

Accepted Answer

a. Below is the PivotTable obtained on t

Question 6

Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level​

Accepted Answer

A) McQuitty's method 
B) centroid linkage 
C) median linkage 
D) Ward's method 
A) McQuitty's method 
B) centroid linkage 
C) median linkage 
D) Ward's method

Question 7

A ___________ refers to the number of times a collection of items occur together in a transaction data set.

Accepted Answer

A) consequent 
B) validation count 
C) support count 
D) antecedent 
A) consequent 
B) validation count 
C) support count 
D) antecedent

Question 8

The endpoint of a k-means clustering algorithm occurs when

Accepted Answer

A) Euclidean distance between clusters is minimized. 
B) Euclidean distance between observations in a cluster is maximized. 
C) no further changes are observed in cluster structure and number. 
D) all of the observations are encompassed within a single large cluster with mean k. 
A) Euclidean distance between clusters is minimized. 
B) Euclidean distance between observations in a cluster is maximized. 
C) no further changes are observed in cluster structure and number. 
D) all of the observations are encompassed within a single large cluster with mean k.

Question 9

____________________ clustering method defines the similarity between two clusters as the similarity of the pair of observations (one from each cluster) that are the most different.

Accepted Answer

The answer of ____________________ clustering method defines the similarity between...

Question 10

Platinum Gym has 10,000 gyms members out of which 1500 memberships included Unlimited Fitness Training and use of the tanning salon, and out of which 750 included Unlimited Hydromassage. If the Fitness Training are considered A, the use of the tanning salon are considered B, and the Hydromassage are considered C, then the associate rule for these sales become "If A and B are purchased, then C is also purchased." Calculate the confidence level.​

Accepted Answer

The answer of Platinum Gym has 10,000 gyms members out...

Question 11

Which is NOT a primary option for addressing missing data?

Accepted Answer

A) ​To discard observations with any missing values 
B) ​To discard any variable with missing values 
C) ​To fill in missing entries with estimated values 
D) ​To generate random data to replace the missing values 
A) ​To discard observations with any missing values 
B) ​To discard any variable with missing values 
C) ​To fill in missing entries with estimated values 
D) ​To generate random data to replace the missing values

Question 12

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is

Accepted Answer

A) 1.40. 
B) 0.54. 
C) 1.00. 
D) 0.75. 
A) 1.40. 
B) 0.54. 
C) 1.00. 
D) 0.75.

Question 13

Which of the following is true of Euclidean distances?

Accepted Answer

A) It is used to measure dissimilarity between categorical variable observations. 
B) It is not affected by the scale on which variables are measured. 
C) It increases with the increase in similarity between variable values. 
D) It is commonly used as a method of measuring dissimilarity between quantitative observations. 
A) It is used to measure dissimilarity between categorical variable observations. 
B) It is not affected by the scale on which variables are measured. 
C) It increases with the increase in similarity between variable values. 
D) It is commonly used as a method of measuring dissimilarity between quantitative observations.

Question 14

___________________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

Accepted Answer

A) Single linkage 
B) Ward's method 
C) Average group linkage 
D) Dendrogram 
A) Single linkage 
B) Ward's method 
C) Average group linkage 
D) Dendrogram

Question 15

In which of the following scenarios would it be appropriate to use hierarchical clustering?

Accepted Answer

A) When the number of observations in the dataset is relatively high. 
B) When it is not necessary to know the nesting of clusters. 
C) When the number of clusters is known beforehand. 
D) When binary or ordinal data needs to be clustered. 
A) When the number of observations in the dataset is relatively high. 
B) When it is not necessary to know the nesting of clusters. 
C) When the number of clusters is known beforehand. 
D) When binary or ordinal data needs to be clustered.

Question 16

Single linkage is a measure of calculating dissimilarity between clusters by

Accepted Answer

A) considering only the two most dissimilar observations in the two clusters. 
B) computing the average dissimilarity between every pair of observations between the two clusters. 
C) considering only the two most similar observations in the two clusters. 
D) considering the distance between the cluster centroids. 
A) considering only the two most dissimilar observations in the two clusters. 
B) computing the average dissimilarity between every pair of observations between the two clusters. 
C) considering only the two most similar observations in the two clusters. 
D) considering the distance between the cluster centroids.

Question 17

Jaccard's coefficient is different from the matching coefficient in that the former

Accepted Answer

A) measures overlap while the latter measures dissimilarity. 
B) does not count matching zero entries while the latter does. 
C) deals with categorical variable while the latter deals with continuous variables. 
D) is affected by the scale used to measure variables while the latter is not. 
A) measures overlap while the latter measures dissimilarity. 
B) does not count matching zero entries while the latter does. 
C) deals with categorical variable while the latter deals with continuous variables. 
D) is affected by the scale used to measure variables while the latter is not.

Question 18

​In k-means clustering, k represents the

Accepted Answer

A) number of variables. 
B) number of clusters. 
C) number of observations in a cluster. 
D) mean of the cluster. 
A) number of variables. 
B) number of clusters. 
C) number of observations in a cluster. 
D) mean of the cluster.

Question 19

A cluster's _____________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

Accepted Answer

A) dimension 
B) affordability 
C) durability 
D) span 
A) dimension 
B) affordability 
C) durability 
D) span

Question 20

Suppose we had a data set of from a call center where customers were asked to choose between the following three options:hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?

Accepted Answer

A) 000 
B) 100 
C) 010 
D) 001 
A) 000 
B) 100 
C) 010 
D) 001

The process of eliminating variables from formal analysis without losing any crucial information is called

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

The strength of the association rule is known as ____________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

In preparing categorical variables for analysis, it is usually best to

Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level

A ___________ refers to the number of times a collection of items occur together in a transaction data set.

The endpoint of a k-means clustering algorithm occurs when

____________________ clustering method defines the similarity between two clusters as the similarity of the pair of observations (one from each cluster) that are the most different.

Which is NOT a primary option for addressing missing data?

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is

Which of the following is true of Euclidean distances?

___________________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

In which of the following scenarios would it be appropriate to use hierarchical clustering?

Single linkage is a measure of calculating dissimilarity between clusters by

Jaccard's coefficient is different from the matching coefficient in that the former

In k-means clustering, k represents the

A cluster's _____________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

Introduction

Descriptive Statistics

Data Visualization

Probability: an Introduction to Modeling Uncertainty

Statistical Inference

Linear Regression

Time Series Analysis and Forecasting

Predictive Data Mining

Spreadsheet Models

Linear Optimization Models

Integer Linear Optimization Models

Nonlinear Optimization Models

Monte Carlo Simulation

Decision Analysis

Filters

Exam 4: Descriptive Data Mining

The process of eliminating variables from formal analysis without losing any crucial information is called

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?​

The strength of the association rule is known as ____________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

In preparing categorical variables for analysis, it is usually best to​

Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level​

A ___________ refers to the number of times a collection of items occur together in a transaction data set.

The endpoint of a k-means clustering algorithm occurs when

____________________ clustering method defines the similarity between two clusters as the similarity of the pair of observations (one from each cluster) that are the most different.

Which is NOT a primary option for addressing missing data?

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is

Which of the following is true of Euclidean distances?

___________________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation.

In which of the following scenarios would it be appropriate to use hierarchical clustering?

Single linkage is a measure of calculating dissimilarity between clusters by

Jaccard's coefficient is different from the matching coefficient in that the former

​In k-means clustering, k represents the

A cluster's _____________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

Introduction

Descriptive Statistics

Data Visualization

Probability: an Introduction to Modeling Uncertainty

Statistical Inference

Linear Regression

Time Series Analysis and Forecasting

Predictive Data Mining

Spreadsheet Models

Linear Optimization Models

Integer Linear Optimization Models

Nonlinear Optimization Models

Monte Carlo Simulation

Decision Analysis

Filters

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

In preparing categorical variables for analysis, it is usually best to

Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level

In k-means clustering, k represents the