Deck 3: Modeling and Evaluation: Going From Defining Business Problems and Data Understanding to Analyzing Data and Answering Questions

Full screen (f)
exit full mode
Question
Fuzzy matching is a data approach used to identify similar individuals based on data known about them.
Use Space or
up arrow
down arrow
to flip the card.
Question
Alibaba and its attempt to identify seller and customer fraud based on various characteristics known about them is an example of similarity matching.
Question
A decision tree can be used to divide data into smaller groups.
Question
The P in IMPACT Cycle represents performing test plan.
Question
Regression is a data approach used to estimate or predict,for each unit,the numerical value of some variable using some type of statistical model.
Question
Clustering is a data approach used to divide individuals into groups in a useful or meaningful way.
Question
An example of classification would be a credit card company flagging a transaction as being approved or potentially being fraudulent and denying payment.
Question
The data approach used to characterize the typical behavior of an individual,group or population by generating summary statistics about the data is referred to as classification.
Question
Link prediction is a data approach used to estimate or predict,for each unit,the numerical value of some variable using some type of statistical model.
Question
Benford's Law is an absolute and all data must conform.
Question
Data reduction is a data approach used to reduce the amount of information that needs to be considered to focus on the most critical items.
Question
Fuzzy matching is a computer-assisted technique of finding matches that are less than 100 percent perfect by finding correspondences between portions of the text of each potential match.
Question
When considering a question such as "Do our customers form natural groups based on similar attributes?" you would use an unsupervised approach.
Question
XBRL is used to facilitate the exchange of financial reporting information between the company and the Securities and Exchange Commission.
Question
Data profiling typically involves unstructured data.
Question
XBRL is a global standard for exchanging financial reporting information that uses XML.
Question
A target is a manually assigned category applied to a record based on an event.
Question
Co-occurrence grouping is an example of a supervised approach.
Question
Existing data that has been manually evaluated and assigned a class is often referred to as test data.
Question
Co-occurrence grouping could be used to match vendors by geographic region.
Question
Which of the following best describes a supervised approach to the evaluation of data?

A) Data exploration that is free from oversight by a superior
B) Data exploration that is conducted with direct oversight by a superior
C) Data exploration that examines the relationships between variables that are hypothesized to exist
D) Data exploration that looks for potential patterns of interest
Question
In general,the more complex the model,the greater the chance of ________.

A) Overfitting the data
B) Underfitting the data
C) Pruning the data
D) The need to reduce the amount of data considered
Question
All of the following are examples of an unsupervised approach to evaluation data except:

A) Similarity matching
B) Clustering
C) Profiling
D) Co-occurrence grouping
Question
________ refers to data that is stored in a database or spreadsheet that is readily searchable.

A) Training data
B) Unstructured data
C) Structured data
D) Test data
Question
Data reduction typically involves the following steps except:

A) Identify the attribute you would like to reduce or focus on.
B) Identify the parameters of the model.
C) Filter the results.
D) Interpret the results.
Question
Which of the following best describes an independent variable?

A) Output
B) Input
C) Application
D) Operation
Question
While overfitting data could lead to an error rate of 0 (zero),it is unlikely that you would be able to ________ your results.

A) define
B) specify
C) articulate
D) generalize
Question
Which approach to data analytics attempts to predict,for each unit,the numerical value of some variable?

A) Classification
B) Regression
C) Similarity matching
D) Link prediction
Question
Which approach to data analytics attempts to characterize the typical behavior of an individual,group or population by generating summary statistics about the data?

A) Classification
B) Regression
C) Profiling
D) Link prediction
Question
When working with a predictive model,under fitting the data is most likely caused by ________.

A) an overly complex model
B) an overly simple model
C) over pruning the data
D) a lack of data reduction
Question
Using social media to look for relationships between related parties that are not otherwise disclosed to identify related party transactions is an example of ________.

A) Classification
B) Regression
C) Profiling
D) Link prediction
Question
Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way?

A) Clustering
B) Data reduction
C) Similarity matching
D) Co-occurrence grouping
Question
All of the following are examples of a supervised approach to evaluation data except:

A) Causal modeling
B) Data reduction
C) Link prediction
D) Regression
Question
Which of the following best describes an unsupervised approach to the evaluation of data?

A) Data exploration that is free from oversight by a superior
B) Data exploration that examines the relationships between variables that are hypothesized to exist
C) Data exploration that looks for potential patterns of interest
D) Data exploration that is conducted with direct oversight by a superior
Question
Which approach to data analytics attempts to identify similar individuals based on data known about them?

A) Classification
B) Clustering
C) Similarity matching
D) Co-occurrence grouping
Question
Which approach to data analytics attempts to assign each unit in a population into a small set of categories?

A) Classification
B) Regression
C) Similarity matching
D) Co-occurrence grouping
Question
Data profiling is used to assess data quality and internal controls.It typically involves the following steps except:

A) Filter the results.
B) Identify the objects or activity you want to profile.
C) Determine the types of profiling you want to perform.
D) Set boundaries or thresholds for the activity.
Question
Which approach to data analytics attempts to forecast a relationship between two data items?

A) Link prediction
B) Regression
C) Similarity matching
D) Co-occurrence grouping
Question
Regression analysis typically involves the following steps except:

A) Identify the variables that might predict an outcome.
B) Identify the parameters of the model.
C) Set boundaries or thresholds.
D) Determine the functional form of the relationship.
Question
Which approach to data analytics attempts to discover associations between individuals based on transactions involving them?

A) Classification
B) Regression
C) Similarity matching
D) Co-occurrence grouping
Question
Which of the following best describes a dependent variable?

A) Output
B) Input
C) Application
D) Operation
Question
Understanding and predicting warranty expense is an important determination for manufacturing firms.When using historical claims data to estimate the current period's warranty expense,the historical claims data represents which of the following?

A) Independent variable
B) Dependent variable
C) Function
D) Statistical Model
Question
Chapter 3 discussed 5 (five)data analytics approaches or techniques are most common to address our accounting questions.List and define 3 of the 5 data analytics approaches.Next,describe how each of the 3 data analytics approaches you list could be used by credit card companies to identify fraudulent credit card activity.
Question
Understanding and predicting inventory obsolescence is an important determination for retail companies.When using competitor selling prices to estimate the inventory obsolescence reserve,the inventory obsolescence reserve represents which of the following?

A) Independent variable
B) Dependent variable
C) Function
D) Statistical Model
Question
Assume that you will be up for a promotion next month and you'd like to impress your boss with your data analytic skills.The company you work for normally books the current month's bad debit for the same amount as the prior month's actual accounts receivable write-offs.Using your general accounting knowledge,explain why this process is not the best method.Next,assuming that you will use a regression analysis,explain the process and describe the data/information you would request/include to perform the analysis.
Question
________ states that in many naturally occurring collections of numbers,the leading significant digit is likely to be small.

A) Leading digits hypothesis
B) Moore's law
C) Benford's law
D) Classification
Question
________ is existing data that has been manually evaluated and assigned a class and ________ is existing data used to evaluate the model.

A) Test data; Training data
B) Training data; Test data
C) Structured data; Unstructured data
D) Unstructured data; Structured data
Question
Benford's Law (be sure to answer all 3 parts):
Part A: Briefly describe Benford's Law.
Part B: Draw a graph that exemplifies data which conforms to Benford's Law (i.e.,what it should look like).
Part C: Briefly describe how auditors could utilize Benford's Law while conducting testwork.
Question
Decision trees are used to divide data into smaller groups by splitting the data at each branch into two or more groups.However,this method could lead to unintended consequences if the decision tree is not pruned.Describe the pruning process,when it can occur and the benefits of using it.
Question
What is the difference between structured data and unstructured data? Provide an example of each.
Question
Retail stores often request customers' zip codes at the end of a sales transaction.This is an example of which data approach?

A) Clustering
B) Regression
C) Similarity matching
D) Classification
Question
________ mark the split between one class and another.

A) Decision trees
B) Identifying questions
C) Decision boundaries
D) Linear classifiers
Question
One of the key tasks of bank auditors is to consider the amount of the loan loss reserve.When developing a model to estimate the current year's loan loss reserve amount,which of the following would be least likely to be included as an independent variable?

A) Original loan approval amount
B) Customer loan history
C) Current aged loans
D) Collections success
Question
Unaware of data analysis tools available to the internal auditors,a store employee frequently processes cash returns without a receipt for $99,which is just below the amount requiring manager approval of $100.An analysis using which of the following would likely (and quickly)identify the employee's fraudulent behavior?

A) Leading digits hypothesis
B) Moore's law
C) Benford's law
D) Clustering
Question
The short surveys regarding dining preferences requested at the bottom of the restaurant bill are an example of which data approach?

A) Clustering
B) Regression
C) Similarity matching
D) Link prediction
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/55
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 3: Modeling and Evaluation: Going From Defining Business Problems and Data Understanding to Analyzing Data and Answering Questions
1
Fuzzy matching is a data approach used to identify similar individuals based on data known about them.
False
2
Alibaba and its attempt to identify seller and customer fraud based on various characteristics known about them is an example of similarity matching.
True
3
A decision tree can be used to divide data into smaller groups.
True
4
The P in IMPACT Cycle represents performing test plan.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
5
Regression is a data approach used to estimate or predict,for each unit,the numerical value of some variable using some type of statistical model.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
6
Clustering is a data approach used to divide individuals into groups in a useful or meaningful way.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
7
An example of classification would be a credit card company flagging a transaction as being approved or potentially being fraudulent and denying payment.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
8
The data approach used to characterize the typical behavior of an individual,group or population by generating summary statistics about the data is referred to as classification.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
9
Link prediction is a data approach used to estimate or predict,for each unit,the numerical value of some variable using some type of statistical model.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
10
Benford's Law is an absolute and all data must conform.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
11
Data reduction is a data approach used to reduce the amount of information that needs to be considered to focus on the most critical items.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
12
Fuzzy matching is a computer-assisted technique of finding matches that are less than 100 percent perfect by finding correspondences between portions of the text of each potential match.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
13
When considering a question such as "Do our customers form natural groups based on similar attributes?" you would use an unsupervised approach.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
14
XBRL is used to facilitate the exchange of financial reporting information between the company and the Securities and Exchange Commission.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
15
Data profiling typically involves unstructured data.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
16
XBRL is a global standard for exchanging financial reporting information that uses XML.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
17
A target is a manually assigned category applied to a record based on an event.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
18
Co-occurrence grouping is an example of a supervised approach.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
19
Existing data that has been manually evaluated and assigned a class is often referred to as test data.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
20
Co-occurrence grouping could be used to match vendors by geographic region.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
21
Which of the following best describes a supervised approach to the evaluation of data?

A) Data exploration that is free from oversight by a superior
B) Data exploration that is conducted with direct oversight by a superior
C) Data exploration that examines the relationships between variables that are hypothesized to exist
D) Data exploration that looks for potential patterns of interest
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
22
In general,the more complex the model,the greater the chance of ________.

A) Overfitting the data
B) Underfitting the data
C) Pruning the data
D) The need to reduce the amount of data considered
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
23
All of the following are examples of an unsupervised approach to evaluation data except:

A) Similarity matching
B) Clustering
C) Profiling
D) Co-occurrence grouping
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
24
________ refers to data that is stored in a database or spreadsheet that is readily searchable.

A) Training data
B) Unstructured data
C) Structured data
D) Test data
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
25
Data reduction typically involves the following steps except:

A) Identify the attribute you would like to reduce or focus on.
B) Identify the parameters of the model.
C) Filter the results.
D) Interpret the results.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
26
Which of the following best describes an independent variable?

A) Output
B) Input
C) Application
D) Operation
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
27
While overfitting data could lead to an error rate of 0 (zero),it is unlikely that you would be able to ________ your results.

A) define
B) specify
C) articulate
D) generalize
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
28
Which approach to data analytics attempts to predict,for each unit,the numerical value of some variable?

A) Classification
B) Regression
C) Similarity matching
D) Link prediction
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
29
Which approach to data analytics attempts to characterize the typical behavior of an individual,group or population by generating summary statistics about the data?

A) Classification
B) Regression
C) Profiling
D) Link prediction
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
30
When working with a predictive model,under fitting the data is most likely caused by ________.

A) an overly complex model
B) an overly simple model
C) over pruning the data
D) a lack of data reduction
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
31
Using social media to look for relationships between related parties that are not otherwise disclosed to identify related party transactions is an example of ________.

A) Classification
B) Regression
C) Profiling
D) Link prediction
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
32
Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way?

A) Clustering
B) Data reduction
C) Similarity matching
D) Co-occurrence grouping
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
33
All of the following are examples of a supervised approach to evaluation data except:

A) Causal modeling
B) Data reduction
C) Link prediction
D) Regression
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
34
Which of the following best describes an unsupervised approach to the evaluation of data?

A) Data exploration that is free from oversight by a superior
B) Data exploration that examines the relationships between variables that are hypothesized to exist
C) Data exploration that looks for potential patterns of interest
D) Data exploration that is conducted with direct oversight by a superior
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
35
Which approach to data analytics attempts to identify similar individuals based on data known about them?

A) Classification
B) Clustering
C) Similarity matching
D) Co-occurrence grouping
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
36
Which approach to data analytics attempts to assign each unit in a population into a small set of categories?

A) Classification
B) Regression
C) Similarity matching
D) Co-occurrence grouping
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
37
Data profiling is used to assess data quality and internal controls.It typically involves the following steps except:

A) Filter the results.
B) Identify the objects or activity you want to profile.
C) Determine the types of profiling you want to perform.
D) Set boundaries or thresholds for the activity.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
38
Which approach to data analytics attempts to forecast a relationship between two data items?

A) Link prediction
B) Regression
C) Similarity matching
D) Co-occurrence grouping
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
39
Regression analysis typically involves the following steps except:

A) Identify the variables that might predict an outcome.
B) Identify the parameters of the model.
C) Set boundaries or thresholds.
D) Determine the functional form of the relationship.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
40
Which approach to data analytics attempts to discover associations between individuals based on transactions involving them?

A) Classification
B) Regression
C) Similarity matching
D) Co-occurrence grouping
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
41
Which of the following best describes a dependent variable?

A) Output
B) Input
C) Application
D) Operation
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
42
Understanding and predicting warranty expense is an important determination for manufacturing firms.When using historical claims data to estimate the current period's warranty expense,the historical claims data represents which of the following?

A) Independent variable
B) Dependent variable
C) Function
D) Statistical Model
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
43
Chapter 3 discussed 5 (five)data analytics approaches or techniques are most common to address our accounting questions.List and define 3 of the 5 data analytics approaches.Next,describe how each of the 3 data analytics approaches you list could be used by credit card companies to identify fraudulent credit card activity.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
44
Understanding and predicting inventory obsolescence is an important determination for retail companies.When using competitor selling prices to estimate the inventory obsolescence reserve,the inventory obsolescence reserve represents which of the following?

A) Independent variable
B) Dependent variable
C) Function
D) Statistical Model
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
45
Assume that you will be up for a promotion next month and you'd like to impress your boss with your data analytic skills.The company you work for normally books the current month's bad debit for the same amount as the prior month's actual accounts receivable write-offs.Using your general accounting knowledge,explain why this process is not the best method.Next,assuming that you will use a regression analysis,explain the process and describe the data/information you would request/include to perform the analysis.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
46
________ states that in many naturally occurring collections of numbers,the leading significant digit is likely to be small.

A) Leading digits hypothesis
B) Moore's law
C) Benford's law
D) Classification
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
47
________ is existing data that has been manually evaluated and assigned a class and ________ is existing data used to evaluate the model.

A) Test data; Training data
B) Training data; Test data
C) Structured data; Unstructured data
D) Unstructured data; Structured data
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
48
Benford's Law (be sure to answer all 3 parts):
Part A: Briefly describe Benford's Law.
Part B: Draw a graph that exemplifies data which conforms to Benford's Law (i.e.,what it should look like).
Part C: Briefly describe how auditors could utilize Benford's Law while conducting testwork.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
49
Decision trees are used to divide data into smaller groups by splitting the data at each branch into two or more groups.However,this method could lead to unintended consequences if the decision tree is not pruned.Describe the pruning process,when it can occur and the benefits of using it.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
50
What is the difference between structured data and unstructured data? Provide an example of each.
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
51
Retail stores often request customers' zip codes at the end of a sales transaction.This is an example of which data approach?

A) Clustering
B) Regression
C) Similarity matching
D) Classification
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
52
________ mark the split between one class and another.

A) Decision trees
B) Identifying questions
C) Decision boundaries
D) Linear classifiers
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
53
One of the key tasks of bank auditors is to consider the amount of the loan loss reserve.When developing a model to estimate the current year's loan loss reserve amount,which of the following would be least likely to be included as an independent variable?

A) Original loan approval amount
B) Customer loan history
C) Current aged loans
D) Collections success
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
54
Unaware of data analysis tools available to the internal auditors,a store employee frequently processes cash returns without a receipt for $99,which is just below the amount requiring manager approval of $100.An analysis using which of the following would likely (and quickly)identify the employee's fraudulent behavior?

A) Leading digits hypothesis
B) Moore's law
C) Benford's law
D) Clustering
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
55
The short surveys regarding dining preferences requested at the bottom of the restaurant bill are an example of which data approach?

A) Clustering
B) Regression
C) Similarity matching
D) Link prediction
Unlock Deck
Unlock for access to all 55 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 55 flashcards in this deck.