Question 1

A data mining routine has been applied to a transaction dataset and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so). The decile-wise lift chart for a transaction data model:

Accepted Answer

By selecting the first decile of the scored results the researcher will have chosen approximately 6.5 times the number of correct classifications than if a random selection process had been used. In other words, the model is doing much better than a random selection process in forecasting whether a particular transaction is fraudulent or nonfraudulent.

Question 2

Which of the following situations represents the confusion matrix for the transactions data mentioned in question 1 above? Explain your reasoning.&#10;A &#10;   B &#10;   C &#10;   D &#10;

Accepted Answer

Assuming that &#34;1&#34; is fraudulent and &#34;0&#34; is nonfraudulent, &#34;C&#34; is the correct matrix.

Question 3

Calculate the classification error rate for the following confusion matrix. Comment on the pattern of misclassifications. How much better did this data mining technique do as compared to a naive model?

Accepted Answer

The model misclassifies class &#34;1&#34; at a rate of 2 out of 10 cases; it misclassifies &#34;0&#34; at a rate of 20 out of 990.&#10;The na&#239;ve model would choose randomly choose 10 of the 1,000 observations as class &#34;1&#34; and the remaining 990 observations would be classified as &#34;0.&#34;

Question 4

Explain what is meant by Bayes' theorem as used in the Naive Bayes model.

Accepted Answer

The answer of Explain what is meant by Bayes' theorem...

Question 5

Explain the difference between a training data set and a validation data set. Why are these data sets used routinely with data mining techniques in the XLMiner&#169; program and not used in the ForecastX&#8482; program? Is there, in fact, a similar technique presented in a previous chapter that is much the same as partitioning a data set?

Accepted Answer

The answer of Explain the difference between a training data...

Question 6

For a data mining classification technique the validation data set lift charts are shown below. What confidence in the model would you express given this evidence?&#10;

Accepted Answer

The answer of For a data mining classification technique the...

Question 7

In data mining the candidate model should be applied to a data set that was not used in the estimation process in order to find out the accuracy on unseen data; that unseen data set. What is the unseen data set called? How is the unseen data set selected?

Accepted Answer

The answer of In data mining the candidate model should...

Question 8

Explain what the &#34;k&#34; in the k-Nearest-Neighbor model references.

Accepted Answer

The answer of Explain what the &#34;k&#34; in the k-Nearest-Neighbor...

Deck 9: Data Mining