Deck 4: Predictive Analytics I: Data Mining Process, methods, and Algorithms
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/69
Play
Full screen (f)
Deck 4: Predictive Analytics I: Data Mining Process, methods, and Algorithms
1
Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.
False
2
Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.
True
3
Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience,but it has little business significance.
False
4
In data mining,classification models help in prediction.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
5
If using a mining analogy,"knowledge mining" would be a more appropriate term than "data mining."
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
6
Ratio data is a type of categorical data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
7
During classification in data mining,a false positive is an occurrence classified as true by the algorithm while being false in reality.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
8
When a problem has many attributes that impact the classification of different patterns,decision trees may be a useful approach.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
9
K-fold cross-validation is also called sliding estimation.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
10
In the cancer research case study,data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
11
Converting continuous valued numerical variables to ranges and categories is referred to as discretization.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
12
Data that is collected,stored,and analyzed in data mining is often private and personal.There is no way to maintain individuals' privacy other than being very careful about physical data security.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
13
In the Dell cases study,the largest issue was how to properly spend the online marketing budget.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
14
Statistics and data mining both look for data sets that are as large as possible.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
15
Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
16
In the Miami-Dade Police Department case study,predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
17
The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
18
The cost of data storage has plummeted recently,making data mining feasible for more firms.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
19
Data mining can be very useful in detecting patterns such as credit card fraud,but is of little help in improving sales.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
20
In the opening case,police detectives used data mining to identify possible new areas of inquiry.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
21
Clustering partitions a collection of things into segments whose members share
A)similar characteristics.
B)dissimilar characteristics.
C)similar collection methods.
D)dissimilar collection methods.
A)similar characteristics.
B)dissimilar characteristics.
C)similar collection methods.
D)dissimilar collection methods.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
22
Which broad area of data mining applications analyzes data,forming rules to distinguish between defined classes?
A)associations
B)visualization
C)classification
D)clustering
A)associations
B)visualization
C)classification
D)clustering
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
23
Understanding customers better has helped Amazon and others become more successful.The understanding comes primarily from
A)collecting data about customers and transactions.
B)developing a philosophy that is data analytics-centric.
C)analyzing the vast data amounts routinely collected.
D)asking the customers what they want.
A)collecting data about customers and transactions.
B)developing a philosophy that is data analytics-centric.
C)analyzing the vast data amounts routinely collected.
D)asking the customers what they want.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
24
Which data mining process/methodology is thought to be the most comprehensive,according to kdnuggets.com rankings?
A)SEMMA
B)proprietary organizational methodologies
C)KDD Process
D)CRISP-DM
A)SEMMA
B)proprietary organizational methodologies
C)KDD Process
D)CRISP-DM
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
25
The data field "ethnic group" can be best described as
A)nominal data.
B)interval data.
C)ordinal data.
D)ratio data.
A)nominal data.
B)interval data.
C)ordinal data.
D)ratio data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
26
A data mining study is specific to addressing a well-defined business task,and different business tasks require
A)general organizational data.
B)general industry data.
C)general economic data.
D)different sets of data.
A)general organizational data.
B)general industry data.
C)general economic data.
D)different sets of data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
27
In estimating the accuracy of data mining (or other)classification models,the true positive rate is
A)the ratio of correctly classified positives divided by the total positive count.
B)the ratio of correctly classified negatives divided by the total negative count.
C)the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives.
D)the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.
A)the ratio of correctly classified positives divided by the total positive count.
B)the ratio of correctly classified negatives divided by the total negative count.
C)the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives.
D)the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
28
In the Influence Health case study,what was the goal of the system?
A)locating clinic patients
B)understanding follow-up care
C)decreasing operational costs
D)increasing service use
A)locating clinic patients
B)understanding follow-up care
C)decreasing operational costs
D)increasing service use
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
29
What does the scalability of a data mining method refer to?
A)its ability to predict the outcome of a previously unknown data set accurately
B)its speed of computation and computational costs in using the mode
C)its ability to construct a prediction model efficiently given a large amount of data
D)its ability to overcome noisy data to make somewhat accurate predictions
A)its ability to predict the outcome of a previously unknown data set accurately
B)its speed of computation and computational costs in using the mode
C)its ability to construct a prediction model efficiently given a large amount of data
D)its ability to overcome noisy data to make somewhat accurate predictions
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
30
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?
A)insurance
B)retailing and logistics
C)customer relationship management
D)computer hardware and software
A)insurance
B)retailing and logistics
C)customer relationship management
D)computer hardware and software
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
31
What is the main reason parallel processing is sometimes used for data mining?
A)because the hardware exists in most organizations,and it is available to use
B)because most of the algorithms used for data mining require it
C)because of the massive data amounts and search efforts involved
D)because any strategic application requires parallel processing
A)because the hardware exists in most organizations,and it is available to use
B)because most of the algorithms used for data mining require it
C)because of the massive data amounts and search efforts involved
D)because any strategic application requires parallel processing
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
32
Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?
A)associations
B)visualization
C)classification
D)clustering
A)associations
B)visualization
C)classification
D)clustering
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
33
All of the following statements about data mining are true EXCEPT
A)the process aspect means that data mining should be a one-step process to results.
B)the novel aspect means that previously unknown patterns are discovered.
C)the potentially useful aspect means that results should lead to some business benefit.
D)the valid aspect means that the discovered patterns should hold true on new data.
A)the process aspect means that data mining should be a one-step process to results.
B)the novel aspect means that previously unknown patterns are discovered.
C)the potentially useful aspect means that results should lead to some business benefit.
D)the valid aspect means that the discovered patterns should hold true on new data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
34
Prediction problems where the variables have numeric values are most accurately defined as
A)classifications.
B)regressions.
C)associations.
D)computations.
A)classifications.
B)regressions.
C)associations.
D)computations.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
35
Which of the following is a data mining myth?
A)Data mining is a multistep process that requires deliberate,proactive design and use.
B)Data mining requires a separate,dedicated database.
C)The current state-of-the-art is ready to go for almost any business.
D)Newer Web-based tools enable managers of all educational levels to do data mining.
A)Data mining is a multistep process that requires deliberate,proactive design and use.
B)Data mining requires a separate,dedicated database.
C)The current state-of-the-art is ready to go for almost any business.
D)Newer Web-based tools enable managers of all educational levels to do data mining.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
36
In data mining,finding an affinity of two products to be commonly together in a shopping cart is known as
A)association rule mining.
B)cluster analysis.
C)decision trees.
D)artificial neural networks.
A)association rule mining.
B)cluster analysis.
C)decision trees.
D)artificial neural networks.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
37
All of the following statements about data mining are true EXCEPT:
A)The term is relatively new.
B)Its techniques have their roots in traditional statistical analysis and artificial intelligence.
C)The ideas behind it are relatively new.
D)Intense,global competition make its application more important.
A)The term is relatively new.
B)Its techniques have their roots in traditional statistical analysis and artificial intelligence.
C)The ideas behind it are relatively new.
D)Intense,global competition make its application more important.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
38
What does the robustness of a data mining method refer to?
A)its ability to predict the outcome of a previously unknown data set accurately
B)its speed of computation and computational costs in using the mode
C)its ability to construct a prediction model efficiently given a large amount of data
D)its ability to overcome noisy data to make somewhat accurate predictions
A)its ability to predict the outcome of a previously unknown data set accurately
B)its speed of computation and computational costs in using the mode
C)its ability to construct a prediction model efficiently given a large amount of data
D)its ability to overcome noisy data to make somewhat accurate predictions
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
39
Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by
A)asking data users to use the data ethically.
B)leaving in identifiers (e.g.,name),but changing other variables.
C)removing identifiers such as names and social security numbers.
D)letting individuals in the data know their data is being accessed.
A)asking data users to use the data ethically.
B)leaving in identifiers (e.g.,name),but changing other variables.
C)removing identifiers such as names and social security numbers.
D)letting individuals in the data know their data is being accessed.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
40
In the Target case study,why did Target send a teen maternity ads?
A)Target's analytic model confused her with an older woman with a similar name.
B)Target was sending ads to all women in a particular neighborhood.
C)Target's analytic model suggested she was pregnant based on her buying habits.
D)Target was using a special promotion that targeted all teens in her geographical area.
A)Target's analytic model confused her with an older woman with a similar name.
B)Target was sending ads to all women in a particular neighborhood.
C)Target's analytic model suggested she was pregnant based on her buying habits.
D)Target was using a special promotion that targeted all teens in her geographical area.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
41
Fayyad et al.(1996)defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
42
There has been an increase in data mining to deal with global competition and customers' more sophisticated ________ and wants.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
43
In the Dell case study,engineers working closely with marketing,used lean software development strategies and numerous technologies to create a highly scalable,singular ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
44
The basic idea behind a(n)________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
45
Because of its successful application to retail business problems,association rule mining is commonly called ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
46
Data are often buried deep within very large ________,which sometimes contain data from several years.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
47
Data preparation,the third step in the CRISP-DM data mining process,is more commonly known as ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
48
Patterns have been manually ________ from data by humans for centuries,but the increasing volume of data in modern times has created a need for more automatic approaches.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
49
The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
50
Knowledge extraction,pattern analysis,data archaeology,information harvesting,pattern searching,and data dredging are all alternative names for ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
51
The ________ is the most commonly used algorithm to discover association rules.Given a set of itemsets,the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
52
In the terrorist funding case study,an observed price ________ may be related to income tax avoidance/evasion,money laundering,or terrorist financing.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
53
As described in the Influence Health case study,customers are more often ________ services from a variety of healthcare service providers before selecting one.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
54
While prediction is largely experience and opinion based,________ is data and model based.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
55
In ________,a classification method,the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset,using the others as a training set.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
56
Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
57
In the Influence Health case,the company was able to evaluate over ________ million records in only two days.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
58
Whereas ________ starts with a well-defined proposition and hypothesis,data mining starts with a loosely defined discovery statement.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
59
One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications,so that the records cannot be traced to an individual.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
60
________ was proposed in the mid-1990s by a European consortium of companies to serve as a nonproprietary standard methodology for data mining.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
61
List and briefly describe the six steps of the CRISP-DM data mining process.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
62
In the data mining in Hollywood case study,how successful were the models in predicting the success or failure of a Hollywood movie?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
63
Briefly describe five techniques (or algorithms)that are used for classification modeling.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
64
Describe the role of the simple split in estimating the accuracy of classification models.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
65
Describe cluster analysis and some of its applications.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
66
List four myths associated with data mining.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
67
List six common data mining mistakes.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
68
In lessons learned from the Target case,what legal warnings would you give another retailer using data mining for marketing?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
69
List five reasons for the growing popularity of data mining in the business world.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck