Question 1

Compare and contrast text mining and data mining.

Accepted Answer

Text mining is the semi-automated process of extracting patterns (useful information and knowledge) from large amounts of unstructured data sources. Data mining is the process of identifying valid, novel, potentially useful, and understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables. Text mining is the same as data mining in that it has the same purpose and uses the same processes, but with text mining the input to the process is a collection of unstructured data files such as Word documents, PDF files, and so on.

Question 2

Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers.

Accepted Answer

A) True 
 B)False  True

Question 3

Using ________ as a rich source of knowledge and a strategic weapon, Kodak not only survives but excels in its market segment defined by innovation and constant change.

Accepted Answer

A)  visualization 
B)  deception detection 
C)  patent analysis 
D)  semantic cues 
A)  visualization 
B)  deception detection 
C)  patent analysis 
D)  semantic cues C

Question 4

Which of the following refers to developing useful information from the links included in the Web documents?

Accepted Answer

A)  Web content mining 
B)  Web subject mining 
C)  Web structure mining 
D)  Web matter mining 
A)  Web content mining 
B)  Web subject mining 
C)  Web structure mining 
D)  Web matter mining

Question 5

At a very high level, the text mining process consists of each of the following tasks except:

Accepted Answer

A)  create log frequencies 
B)  establish the corpus 
C)  create the term-document matrix 
D)  extract the knowledge 
A)  create log frequencies 
B)  establish the corpus 
C)  create the term-document matrix 
D)  extract the knowledge

Question 6

A vast majority of business data are stored in text documents that are ________.

Accepted Answer

A)  mostly quantitative 
B)  virtually unstructured 
C)  semi-structured 
D)  highly structured 
A)  mostly quantitative 
B)  virtually unstructured 
C)  semi-structured 
D)  highly structured

Question 7

The term "stop-words" are used by text mining to ________ commonly used words.

Accepted Answer

The answer of The term "stop-words" are used by text...

Question 8

________ applications focus on "who and how" questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.

Accepted Answer

The answer of ________ applications focus on "who and how"...

Question 9

________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words.

Accepted Answer

A)  Morphology 
B)  Corpus 
C)  Stemming 
D)  Polysemes 
A)  Morphology 
B)  Corpus 
C)  Stemming 
D)  Polysemes

Question 10

Which of the following is not one of the three main areas of Web mining?

Accepted Answer

A)  Web search mining 
B)  Web content mining 
C)  Web structure mining 
D)  Web usage mining 
A)  Web search mining 
B)  Web content mining 
C)  Web structure mining 
D)  Web usage mining

Question 11

Stemming is the process of reducing inflected words to their base or root form.

Accepted Answer

A) True 
 B)False

Question 12

A(n) ________ is one or more Web pages that provide a collection of links to authoritative pages.

Accepted Answer

The answer of A(n) ________ is one or more Web...

Question 13

The two main approaches to text classification are ________ and ________.

Accepted Answer

A)  knowledge engineering; machine learning 
B)  categorization; clustering 
C)  association; trend analysis 
D)  knowledge extraction; association 
A)  knowledge engineering; machine learning 
B)  categorization; clustering 
C)  association; trend analysis 
D)  knowledge extraction; association

Question 14

One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.

Accepted Answer

knowledge

Question 15

By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have
developed methods that can automatically identify the concepts and relationships between those concepts in the text.

Accepted Answer

A) True 
 B)False

Question 16

________ is the grouping of similar documents without having a predefined set of categories.

Accepted Answer

The answer of ________ is the grouping of similar documents...

Question 17

Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?

Accepted Answer

Natural human language is vagu

Question 18

In ________, the problem is to group an unlabelled collection of objects, such as documents, customer comments, and Web pages into meaningful groups without any prior knowledge.

Accepted Answer

A)  search recall 
B)  classification 
C)  clustering 
D)  grouping 
A)  search recall 
B)  classification 
C)  clustering 
D)  grouping

Question 19

________ is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables.

Accepted Answer

The answer of ________ is the process of identifying valid,...

Question 20

Stop words, such as a, am, the, and was, are words that are filtered out prior to or after processing of natural language data.

Accepted Answer

A) True 
 B)False

Compare and contrast text mining and data mining.

Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers.

Using ________ as a rich source of knowledge and a strategic weapon, Kodak not only survives but excels in its market segment defined by innovation and constant change.

Which of the following refers to developing useful information from the links included in the Web documents?

At a very high level, the text mining process consists of each of the following tasks except:

A vast majority of business data are stored in text documents that are ________.

The term "stop-words" are used by text mining to ________ commonly used words.

________ applications focus on "who and how" questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.

________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words.

Which of the following is not one of the three main areas of Web mining?

Stemming is the process of reducing inflected words to their base or root form.

A(n) ________ is one or more Web pages that provide a collection of links to authoritative pages.

The two main approaches to text classification are and .

One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.

By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have developed methods that can automatically identify the concepts and relationships between those concepts in the text.

________ is the grouping of similar documents without having a predefined set of categories.

Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?

In ________, the problem is to group an unlabelled collection of objects, such as documents, customer comments, and Web pages into meaningful groups without any prior knowledge.

________ is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables.

Stop words, such as a, am, the, and was, are words that are filtered out prior to or after processing of natural language data.

Business Intelligence

Data Warehousing

Business Performance Management

Data Mining for Business Intelligence

Business Intelligence Implementation: Integration and Emerging Trends

Filters

Exam 5: Text and Web Mining

Compare and contrast text mining and data mining.

Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers.

Using ________ as a rich source of knowledge and a strategic weapon, Kodak not only survives but excels in its market segment defined by innovation and constant change.

Which of the following refers to developing useful information from the links included in the Web documents?

At a very high level, the text mining process consists of each of the following tasks except:

A vast majority of business data are stored in text documents that are ________.

The term "stop-words" are used by text mining to ________ commonly used words.

________ applications focus on "who and how" questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.

________ is a branch of the field of linguistics and a part of natural language processing that studies the internal structure of words.

Which of the following is not one of the three main areas of Web mining?

Stemming is the process of reducing inflected words to their base or root form.

A(n) ________ is one or more Web pages that provide a collection of links to authoritative pages.

The two main approaches to text classification are ________ and ________.

One of the main approaches to text classification is ________ in which an expert's knowledge is encoded into the system either declaratively or in the form of procedural classification rules.

By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have developed methods that can automatically identify the concepts and relationships between those concepts in the text.

________ is the grouping of similar documents without having a predefined set of categories.

Why will computers probably not be able to understand natural language the same way and with the same accuracy that humans do?

In ________, the problem is to group an unlabelled collection of objects, such as documents, customer comments, and Web pages into meaningful groups without any prior knowledge.

________ is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases, where the data are organized in records structured by categorical, ordinal, or continuous variables.

Stop words, such as a, am, the, and was, are words that are filtered out prior to or after processing of natural language data.

Business Intelligence

Data Warehousing

Business Performance Management

Data Mining for Business Intelligence

Business Intelligence Implementation: Integration and Emerging Trends

Filters

The two main approaches to text classification are and .