Deck 17: Data Mining

Full screen (f)
exit full mode
Question
Megan is examining the likelihood of people riding the subway.The dependent variable takes on the value of 1 if the individual rides the subway and 0 otherwise.Therefore,she could use logistic regression to examine this question.
Use Space or
up arrow
down arrow
to flip the card.
Question
When using data partitioning,the second subset,which usually contains the records that were not included in the training data,is called the prediction data set.
Question
Cluster analysis tries to group observations into clusters so that observations within a cluster a different and observations in different clusters are similar.
Question
The K in K-Means refers to the number of clusters.
Question
A data mart is typically smaller than a data warehouse.
Question
A neural network methodology attempts to mimic

A)the complex behavior of children.
B)the complex behavior of the human brain.
C)human emotion.
D)quantifiable random processes.
Question
Segmentation is also known as clustering,and involves trying to group entities into similar clusters.
Question
Which of the following statements about logistic regression is false?

A)Logistic regression estimates the probability that an individual is in a particular category.
B)Logistic regression uses a nonlinear function of the explanatory variables for classification.
C)Logistic regression is essentially regression with a binary dependent variable.
D)Logistic regression requires that the error terms are uniformly distributed.
Question
Mya is investigating the factors that impact soda consumption.She examines a host of variables that help explain the amount consumed.Which type of data mining methodology is she most likely to use?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
Question
Clustering is considered a supervised data mining technique.
Question
Lift is the increase in the number of purchasers over the typical number of purchasers.
Question
Logistic regression and neural networks use complex nonlinear functions to capture the relationship between explanatory variables and categorical dependent variables.
Question
The logarithm of the odds ratio is called the

A)logit.
B)logos.
C)lods.
D)logodra.
Question
The testing set in data partitioning is the

A)first subset of data,which usually contains 70% of the records.
B)second subset of data,which usually contains 30% or less of the records.
C)initial dataset from which subsets are created.
D)first subset of data,which usually contains 30% of the records.
Question
Data mining is used to examine known,expected patterns and relationships among variables.
Question
Which methodology is used to group products that customers purchase together?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
Question
When using data partitioning,the first subset,usually with about 70% to 80% of the records,is called the training data set.
Question
Which of the following is not a methodology useful for data mining?

A)Classification analysis
B)Prediction
C)Cluster analysis
D)Stock market analysis
Question
Classification analysis attempts to find variables that are related to a quantitative variable.
Question
It is very useful to partition a large data set into all of the following subsets,except the _____ data set.

A)training
B)data
C)explanatory
D)prediction
Question
Unsupervised methods have no

A)dependent variable.
B)clustering.
C)segmentation.
D)association analysis.
Question
If the regression coefficient estimate from a logistic regression is positive,the probability of the dependent variable taking on a value of 1

A)decreases.
B)approaches zero.
C)increases.
D)remains constant.
Question
Clustering tried to group entities into _____ clusters,based on the value of their variables.

A)trier
B)similar
C)nontrier
D)discovery
Question
The predicted value from a logistic regression will be

A)between 0 and 1.
B)between -1 and 1.
C)less than 0.
D)greater than 1.
Question
Melody is a department store manager and wants to examine whether or not female shoppers are more likely than male shoppers to use a department credit card."Female = 1" indicates the individual is a female."Credit Card = 1" indicates the individual used a credit card to make the purchase."Amount spent" is in dollars.Melody runs a logistic regression.If the estimate on the female variable is positive,what does this indicate about credit card usage?
Question
Once a dissimilarity measure is developed,a clustering algorithm attempts to find

A)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are dissimilar.
B)clusters of rows where rows within a cluster are similar and rows in different clusters are similar.
C)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are similar.
D)clusters of rows where rows within a cluster are similar and rows in different clusters are dissimilar.
Question
The higher the "score" for a particular member in logistic regression,the

A)higher the likelihood that member is in category 1.
B)lower the likelihood that member is in category 1.
C)higher the likelihood that member is in category 0.
D)higher the likelihood that member is not in a category.
Question
Bridget has partitioned data into two subsets.The original file contains 300,000 observations.The subset she is currently working with has 60,000 observations.Which subset is she most likely to be using?

A)The training set
B)The original set
C)The testing set
D)The prediction set
Question
Suppose the odds of Team A winning are 5 to 1.Then,the odds ratio is

A)5/1.
B)1/5.
C)6/1.
D)1/6.
Question
In K-Means clustering,K refers to the

A)size of the population.
B)size of the sample.
C)number of clusters.
D)size of each cluster.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/30
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 17: Data Mining
1
Megan is examining the likelihood of people riding the subway.The dependent variable takes on the value of 1 if the individual rides the subway and 0 otherwise.Therefore,she could use logistic regression to examine this question.
True
2
When using data partitioning,the second subset,which usually contains the records that were not included in the training data,is called the prediction data set.
False
3
Cluster analysis tries to group observations into clusters so that observations within a cluster a different and observations in different clusters are similar.
False
4
The K in K-Means refers to the number of clusters.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
5
A data mart is typically smaller than a data warehouse.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
6
A neural network methodology attempts to mimic

A)the complex behavior of children.
B)the complex behavior of the human brain.
C)human emotion.
D)quantifiable random processes.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
7
Segmentation is also known as clustering,and involves trying to group entities into similar clusters.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
8
Which of the following statements about logistic regression is false?

A)Logistic regression estimates the probability that an individual is in a particular category.
B)Logistic regression uses a nonlinear function of the explanatory variables for classification.
C)Logistic regression is essentially regression with a binary dependent variable.
D)Logistic regression requires that the error terms are uniformly distributed.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
9
Mya is investigating the factors that impact soda consumption.She examines a host of variables that help explain the amount consumed.Which type of data mining methodology is she most likely to use?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
10
Clustering is considered a supervised data mining technique.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
11
Lift is the increase in the number of purchasers over the typical number of purchasers.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
12
Logistic regression and neural networks use complex nonlinear functions to capture the relationship between explanatory variables and categorical dependent variables.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
13
The logarithm of the odds ratio is called the

A)logit.
B)logos.
C)lods.
D)logodra.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
14
The testing set in data partitioning is the

A)first subset of data,which usually contains 70% of the records.
B)second subset of data,which usually contains 30% or less of the records.
C)initial dataset from which subsets are created.
D)first subset of data,which usually contains 30% of the records.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
15
Data mining is used to examine known,expected patterns and relationships among variables.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
16
Which methodology is used to group products that customers purchase together?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
17
When using data partitioning,the first subset,usually with about 70% to 80% of the records,is called the training data set.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
18
Which of the following is not a methodology useful for data mining?

A)Classification analysis
B)Prediction
C)Cluster analysis
D)Stock market analysis
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
19
Classification analysis attempts to find variables that are related to a quantitative variable.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
20
It is very useful to partition a large data set into all of the following subsets,except the _____ data set.

A)training
B)data
C)explanatory
D)prediction
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
21
Unsupervised methods have no

A)dependent variable.
B)clustering.
C)segmentation.
D)association analysis.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
22
If the regression coefficient estimate from a logistic regression is positive,the probability of the dependent variable taking on a value of 1

A)decreases.
B)approaches zero.
C)increases.
D)remains constant.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
23
Clustering tried to group entities into _____ clusters,based on the value of their variables.

A)trier
B)similar
C)nontrier
D)discovery
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
24
The predicted value from a logistic regression will be

A)between 0 and 1.
B)between -1 and 1.
C)less than 0.
D)greater than 1.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
25
Melody is a department store manager and wants to examine whether or not female shoppers are more likely than male shoppers to use a department credit card."Female = 1" indicates the individual is a female."Credit Card = 1" indicates the individual used a credit card to make the purchase."Amount spent" is in dollars.Melody runs a logistic regression.If the estimate on the female variable is positive,what does this indicate about credit card usage?
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
26
Once a dissimilarity measure is developed,a clustering algorithm attempts to find

A)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are dissimilar.
B)clusters of rows where rows within a cluster are similar and rows in different clusters are similar.
C)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are similar.
D)clusters of rows where rows within a cluster are similar and rows in different clusters are dissimilar.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
27
The higher the "score" for a particular member in logistic regression,the

A)higher the likelihood that member is in category 1.
B)lower the likelihood that member is in category 1.
C)higher the likelihood that member is in category 0.
D)higher the likelihood that member is not in a category.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
28
Bridget has partitioned data into two subsets.The original file contains 300,000 observations.The subset she is currently working with has 60,000 observations.Which subset is she most likely to be using?

A)The training set
B)The original set
C)The testing set
D)The prediction set
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
29
Suppose the odds of Team A winning are 5 to 1.Then,the odds ratio is

A)5/1.
B)1/5.
C)6/1.
D)1/6.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
30
In K-Means clustering,K refers to the

A)size of the population.
B)size of the sample.
C)number of clusters.
D)size of each cluster.
Unlock Deck
Unlock for access to all 30 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 30 flashcards in this deck.