Deck 7: Text Analytics, Text Mining, and Sentiment Analysis
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/70
Play
Full screen (f)
Deck 7: Text Analytics, Text Mining, and Sentiment Analysis
1
In the financial services firm case study, text analysis for associate-customer interactions were completely automated and could detect whether they met the company's standards.
True
2
Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.
True
3
In text mining, inputs to the process include unstructured data such as Word documents, PDF files, text excerpts, e-mail and XML files.
True
4
Regional accents present challenges for natural language processing.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
5
Categorization and clustering of documents during text mining differ only in the preselection of categories.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
6
In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
7
In the patent analysis case study, text mining of thousands of patents held by the firm and its competitors helped improve competitive intelligence, but was of little use in identifying complementary products.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
8
The bag-of-words model is appropriate for spam detection but not for text analytics.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
9
In the chapter's opening vignette, IBM's computer named Watson outperformed human game champions on the game show Jeopardy!
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
10
In the Hong Kong government case study, reporting time was the main benefit of using SAS Business Analytics to generate reports.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
11
In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
12
In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
13
In the BBVA case study, text analytics was used to help the company defend and enhance its reputation in social media.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
14
The linguistic approach to speech handles processes elements such as intensity, pitch and jitter from speech recorded on audio.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
15
In text mining, creating the term-document matrix includes all the terms that are included in all documents, making for huge matrices only manageable on computers.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
16
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
17
Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
18
Chinese, Japanese, and Thai have features that make them more difficult candidates for natural language processing.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
19
Detecting lies from text transcripts of conversations is a future goal of text mining as current systems achieve only 50% accuracy of detection.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
20
During information extraction, entity recognition (the recognition of names of people and organizations) takes place after relationship extraction.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
21
What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?
A) medium- to large-sized documents
B) small- to medium-sized documents
C) large-sized documents
D) collections of documents
A) medium- to large-sized documents
B) small- to medium-sized documents
C) large-sized documents
D) collections of documents
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
22
What do voice of the market (VOM) applications of sentiment analysis do?
A) They examine customer sentiment at the aggregate level.
B) They examine employee sentiment in the organization.
C) They examine the stock market for trends.
D) They examine the "market of ideas" in politics.
A) They examine customer sentiment at the aggregate level.
B) They examine employee sentiment in the organization.
C) They examine the stock market for trends.
D) They examine the "market of ideas" in politics.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
23
All of the following are challenges associated with natural language processing EXCEPT
A) dividing up a text into individual words in English.
B) understanding the context in which something is said.
C) distinguishing between words that have more than one meaning.
D) recognizing typographical or grammatical errors in texts.
A) dividing up a text into individual words in English.
B) understanding the context in which something is said.
C) distinguishing between words that have more than one meaning.
D) recognizing typographical or grammatical errors in texts.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
24
In text analysis, what is a lexicon?
A) a catalog of words, their synonyms, and their meanings
B) a catalog of customers, their words, and phrase
C) a catalog of letters, words, phrases and sentences
D) a catalog of customers, products, words, and phrase
A) a catalog of words, their synonyms, and their meanings
B) a catalog of customers, their words, and phrase
C) a catalog of letters, words, phrases and sentences
D) a catalog of customers, products, words, and phrase
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
25
How is objectivity handled in sentiment analysis?
A) It is ignored because it does not appear in customer sentiment.
B) It is incorporated as a type of sentiment.
C) It is clarified with the customer who expressed it.
D) It is identified and removed as facts are not sentiment.
A) It is ignored because it does not appear in customer sentiment.
B) It is incorporated as a type of sentiment.
C) It is clarified with the customer who expressed it.
D) It is identified and removed as facts are not sentiment.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
26
According to a study by Merrill Lynch and Gartner, what percentage of all corporate data is captured and stored in some sort of unstructured form?
A) 15%
B) 75%
C) 25%
D) 85%
A) 15%
B) 75%
C) 25%
D) 85%
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
27
In text mining, stemming is the process of
A) categorizing a block of text in a sentence.
B) reducing multiple words to their base or root.
C) transforming the term-by-document matrix to a manageable size.
D) creating new branches or stems of recorded paragraphs.
A) categorizing a block of text in a sentence.
B) reducing multiple words to their base or root.
C) transforming the term-by-document matrix to a manageable size.
D) creating new branches or stems of recorded paragraphs.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
28
In sentiment analysis, which of the following is an implicit opinion?
A) The hotel we stayed in was terrible.
B) The customer service I got for my TV was laughable.
C) The cruise we went on last summer was a disaster.
D) Our new mayor is great for the city.
A) The hotel we stayed in was terrible.
B) The customer service I got for my TV was laughable.
C) The cruise we went on last summer was a disaster.
D) Our new mayor is great for the city.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
29
In the Whirlpool case study, the company sought to better understand information coming from which source?
A) customer transaction data
B) delivery information
C) customer e-mails
D) goods moving through the internal supply chain
A) customer transaction data
B) delivery information
C) customer e-mails
D) goods moving through the internal supply chain
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
30
In text mining, tokenizing is the process of
A) categorizing a block of text in a sentence.
B) reducing multiple words to their base or root.
C) transforming the term-by-document matrix to a manageable size.
D) creating new branches or stems of recorded paragraphs.
A) categorizing a block of text in a sentence.
B) reducing multiple words to their base or root.
C) transforming the term-by-document matrix to a manageable size.
D) creating new branches or stems of recorded paragraphs.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
31
Identifying the target of an expressed sentiment is difficult for all the following reasons EXCEPT
A) the review may not be directly connected to the target through the topic name.
B) blogs and articles with the sentiment may be general in nature.
C) strong sentiments may be generated by a computer, not a person.
D) sometimes there are multiple targets expressed in a sentiment.
A) the review may not be directly connected to the target through the topic name.
B) blogs and articles with the sentiment may be general in nature.
C) strong sentiments may be generated by a computer, not a person.
D) sometimes there are multiple targets expressed in a sentiment.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
32
In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT
A) massive parallelism to enable simultaneous consideration of multiple hypotheses.
B) an underlying confidence subsystem that ranks and integrates answers.
C) a core engine that could operate seamlessly in another domain without changes.
D) integration of shallow and deep knowledge.
A) massive parallelism to enable simultaneous consideration of multiple hypotheses.
B) an underlying confidence subsystem that ranks and integrates answers.
C) a core engine that could operate seamlessly in another domain without changes.
D) integration of shallow and deep knowledge.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
33
Which of these applications will derive the LEAST benefit from text mining?
A) patients' medical files
B) patent description files
C) sales transaction files
D) customer comment files
A) patients' medical files
B) patent description files
C) sales transaction files
D) customer comment files
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
34
Inputs to speech analytics include all of the following EXCEPT
A) written transcripts of calls to service centers.
B) recorded conversations of customer call-ins.
C) live customer interactions with service representatives.
D) videos of customer focus groups.
A) written transcripts of calls to service centers.
B) recorded conversations of customer call-ins.
C) live customer interactions with service representatives.
D) videos of customer focus groups.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
35
What application is MOST dependent on text analysis of transcribed sales call center notes and voice conversations with customers?
A) finance
B) OLAP
C) CRM
D) ERP
A) finance
B) OLAP
C) CRM
D) ERP
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
36
In the research literature case study, the researchers analyzing academic papers extracted information from which source?
A) the paper abstract
B) the paper keywords
C) the main body of the paper
D) the paper references
A) the paper abstract
B) the paper keywords
C) the main body of the paper
D) the paper references
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
37
Sentiment classification usually covers all the following issues EXCEPT
A) classes of sentiment (e.g., positive versus negative).
B) range of polarity (e.g., star ratings for hotels and for restaurants).
C) range in strength of opinion.
D) biometric identification of the consumer expressing the sentiment.
A) classes of sentiment (e.g., positive versus negative).
B) range of polarity (e.g., star ratings for hotels and for restaurants).
C) range in strength of opinion.
D) biometric identification of the consumer expressing the sentiment.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
38
In text mining, which of the following methods is NOT used to reduce the size of a sparse matrix?
A) using a domain expert
B) normalizing word frequencies
C) using singular value decomposition
D) eliminating rarely occurring terms
A) using a domain expert
B) normalizing word frequencies
C) using singular value decomposition
D) eliminating rarely occurring terms
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
39
What data discovery process, whereby objects are categorized into predetermined groups, is used in text mining?
A) clustering
B) association
C) classification
D) trend analysis
A) clustering
B) association
C) classification
D) trend analysis
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
40
In the Blue Cross Blue Shield case study, speech analytics were used to identify "confusion" calls by customers. What was true about these calls?
A) They took less time than others as frustrated customers hung up.
B) They led customers to rely more on self-serve options.
C) They were not documented by customer service reps for speech analytics.
D) They were difficult to identify using standard phrases like "I don't get it."
A) They took less time than others as frustrated customers hung up.
B) They led customers to rely more on self-serve options.
C) They were not documented by customer service reps for speech analytics.
D) They were difficult to identify using standard phrases like "I don't get it."
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
41
________ is probably the most often used form of information extraction.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
42
At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
43
In the text mining system developed by Ghani et al., treating products as sets of ________ rather than as atomic entities can potentially boost the effectiveness of many business applications.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
44
When identifying the polarity of text, the most granular level for polarity identification is at the ________ level.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
45
In automated sentiment analysis, two primary methods have been deployed to predict sentiment within audio: acoustic/phonetic and ________ modeling.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
46
________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
47
When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
48
When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
49
In sensitivity analysis, the task of differentiating between a fact and an opinion can also be characterized as calculation of ________ polarity.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
50
When labeling each term in the WordNet lexical database, the group of cognitive synonyms (or synset) to which this term belongs is classified using a set of ________, each of which is capable of deciding whether the synset is Positive, or Negative, or Objective.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
51
Where ________ appears in text, it comes in two flavors: explicit, where the subjective sentence directly expresses an opinion, and implicit, where the text implies an opinion.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
52
________ focuses on listening to social media where anyone can post opinions that can damage or boost your reputation.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
53
________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
54
Because the term-document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
55
In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
56
IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called ________.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
57
Among the significant advantages associated with the ________ approach to linguistic modeling is the method's ability to maintain a high degree of accuracy no matter what the quality of the audio source, and its incorporation of conversational context through the use of structured queries.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
58
The time-demanding and laborious process of the ________ approach makes it impractical for use with live audio streams.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
59
________ models operate on the premise that, when in a charged state, a speaker has a higher probability of using specific words, exclamations, or phrases in a particular order.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
60
________, also called homonyms, are syntactically identical words with different meanings.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
61
Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
62
When IBM Research began looking for a major research challenge to rival the scientific and popular interest of Deep Blue, the computer chess-playing champion, what was the company's goal?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
63
Identify, with a brief description, each of the four steps in the sentiment analysis process.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
64
Name and briefly describe four of the most popular commercial software tools used for text mining.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
65
Sentiment analysis has many names. Which other names is it often known by?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
66
In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
67
Describe the query-specific clustering method as it relates to clustering.
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
68
What is the definition of text analytics according to the experts in the field?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
69
How would you describe information extraction in text mining?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck
70
Within the context of speech analytics, what does the linguistic approach focus on?
Unlock Deck
Unlock for access to all 70 flashcards in this deck.
Unlock Deck
k this deck