Deck 7: Big Data Concepts and Tools

Full screen (f)
exit full mode
Question
Big Data simplifies data governance issues,especially for global firms.
Use Space or
up arrow
down arrow
to flip the card.
Question
Social media mentions can be used to chart and predict flu outbreaks.
Question
Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.
Question
There is a clear difference between the type of information support provided by influential users versus the others on Twitter.
Question
In the Salesforce case study,streaming data is used to identify services that customers use most.
Question
Hadoop and MapReduce require each other to work.
Question
Despite their potential,many current NoSQL tools lack mature management and monitoring tools.
Question
The term "Big Data" is relative as it depends on the size of the using organization.
Question
Current total storage capacity lags behind the digital information being generated in the world.
Question
In the opening vignette,the Access Telecom (AT),built a system to better visualize customers who were unhappy before they canceled their service.
Question
MapReduce can be easily understood by skilled programmers due to its procedural nature.
Question
Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.
Question
Big Data uses commodity hardware,which is expensive,specialized hardware that is custom built for a client or application.
Question
In most cases,Hadoop is used to replace data warehouses.
Question
It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics.
Question
The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.
Question
For low latency,interactive reports,a data warehouse is preferable to Hadoop.
Question
In Application Case 7.6,Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse,it was found that urban individuals have a higher number of diagnosed disease conditions.
Question
If you have many flexible programming languages running in parallel,Hadoop is preferable to a data warehouse.
Question
Big Data is being driven by the exponential growth,availability,and use of information.
Question
In the Twitter case study,how did influential users support their tweets?

A)opinion
B)objective data
C)multiple posts
D)references to other users
Question
Data flows can be highly inconsistent,with periodic peaks,making data loads hard to manage.What is this feature of Big Data called?

A)volatility
B)periodicity
C)inconsistency
D)variability
Question
Traditional data warehouses have not been able to keep up with

A)the evolution of the SQL language.
B)the variety and complexity of data.
C)expert systems that run on them.
D)OLAP.
Question
How does Hadoop work?

A)It integrates Big Data into a whole so large data elements can be processed as a whole on one computer.
B)It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers.
C)It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer.
D)It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
Question
In the financial services industry,Big Data can be used to improve

A)regulatory oversight.
B)decision making.
C)customer service.
D)both A & B.
Question
A newly popular unit of data in the Big Data era is the petabyte (PB),which is

A)109 bytes.
B)1012 bytes.
C)1015 bytes.
D)1018 bytes.
Question
In a network analysis,what connects nodes?

A)edges
B)metrics
C)paths
D)visualizations
Question
In a Hadoop "stack," what is a slave node?

A)a node where bits of programs are stored
B)a node where metadata is stored and used to organize data processing
C)a node where data is stored and processed
D)a node responsible for holding all the source programs
Question
Which Big Data approach promotes efficiency,lower cost,and better performance by processing jobs in a shared,centrally managed pool of IT resources?

A)in-memory analytics
B)in-database analytics
C)grid computing
D)appliances
Question
Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

A)ANSI 2003 SQL compliance is required
B)online archives alternative to tape
C)unrestricted,ungoverned sandbox explorations
D)analysis of provisional data
Question
Which of the following sources is likely to produce Big Data the fastest?

A)order entry clerks
B)cashiers
C)RFID tags
D)online customers
Question
Companies with the largest revenues from Big Data tend to be

A)the largest computer and IT services firms.
B)small computer and IT services firms.
C)pure open source Big Data firms.
D)non-U.S.Big Data firms.
Question
What is the Hadoop Distributed File System (HDFS)designed to handle?

A)unstructured and semistructured relational data
B)unstructured and semistructured non-relational data
C)structured and semistructured relational data
D)structured and semistructured non-relational data
Question
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights.What is this process called?

A)in-memory analytics
B)in-database analytics
C)grid computing
D)appliances
Question
What is Big Data's relationship to the cloud?

A)Hadoop cannot be deployed effectively in the cloud just yet.
B)Amazon and Google have working Hadoop cloud offerings.
C)IBM's homegrown Hadoop platform is the only option.
D)Only MapReduce works in the cloud; Hadoop does not.
Question
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is

A)easier with the advent of BI and Big Data.
B)essentially the same now as it has always been.
C)an increasingly challenging task for today's enterprises.
D)now completely automated with no human intervention required.
Question
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?

A)backup node
B)secondary node
C)substitute node
D)slave node
Question
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study,what was the analytic goal?

A)determine if diseases are accurately diagnosed
B)determine probabilities of diseases that are comorbid
C)determine differences in rates of disease in urban and rural populations
D)determine differences in rates of disease in males v.females
Question
In the Alternative Data for Market Analysis or Forecasts case study,satellite data was NOT used for

A)evaluating retail traffic.
B)monitoring activity at factories.
C)tracking agricultural estimates.
D)monitoring individual customer patterns.
Question
All of the following statements about MapReduce are true EXCEPT

A)MapReduce is a general-purpose execution engine.
B)MapReduce handles the complexities of network communication.
C)MapReduce handles parallel programming.
D)MapReduce runs without fault tolerance.
Question
Big Data comes from ________.
Question
________ bring together hardware and software in a physical unit that is not only fast but also scalable on an as-needed basis.
Question
Hadoop is primarily a(n)________ file system and lacks capabilities we'd associate with a DBMS,such as indexing,random access to data,and support for SQL.
Question
In the world of Big Data,________ aids organizations in processing and analyzing large volumes of multistructured data.Examples include indexing and search,graph analysis,etc.
Question
________ of data provides business value; pulling of data from multiple subject areas and numerous applications into one repository is the raison d'être for data warehouses.
Question
________ refers to the conformity to facts: accuracy,quality,truthfulness,or trustworthiness of the data.
Question
Big Data employs ________ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data.
Question
The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword - ________.
Question
As volumes of Big Data arrive from multiple sources such as sensors,machines,social media,and clickstream interactions,the first step is to ________ all the data reliably and cost effectively.
Question
Organizations are working with data that meets the three V's-variety,volume,and ________ characterizations.
Question
HBase is a nonrelational ________ that allows for low-latency,quick lookups in Hadoop.
Question
________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database.
Question
The ________ Node in a Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail.
Question
In open-source databases,the most important performance enhancement to date is the cost-based ________.
Question
In-motion ________ is often overlooked today in the world of BI and Big Data.
Question
The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than "small" data.
Question
HBase,Cassandra,MongoDB,and Accumulo are examples of ________ databases.
Question
A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs,or the processing of the data.
Question
As the size and the complexity of analytical systems increase,the need for more ________ analytical systems is also increasing to obtain the best performance.
Question
In the energy industry,________ grids are one of the most impactful applications of stream analytics.
Question
Define MapReduce.
Question
In the opening vignette,why was the Telecom company so concerned about the loss of customers,if customer churn is common in that industry?
Question
List and describe the three main "V"s that characterize Big Data.
Question
List and describe four of the most critical success factors for Big Data analytics.
Question
What are the differences between stream analytics and perpetual analytics? When would you use one or the other?
Question
What is NoSQL as used for Big Data? Describe its major downsides.
Question
Describe data stream mining and how it is used.
Question
When considering Big Data projects and architecture,list and describe five challenges designers should be mindful of in order to make the journey to analytics competency less stressful.
Question
Why are some portions of tape backup workloads being redirected to Hadoop clusters today?
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/69
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 7: Big Data Concepts and Tools
1
Big Data simplifies data governance issues,especially for global firms.
False
2
Social media mentions can be used to chart and predict flu outbreaks.
True
3
Satellite data can be used to evaluate the activity at retail locations as a source of alternative data.
True
4
There is a clear difference between the type of information support provided by influential users versus the others on Twitter.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
5
In the Salesforce case study,streaming data is used to identify services that customers use most.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
6
Hadoop and MapReduce require each other to work.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
7
Despite their potential,many current NoSQL tools lack mature management and monitoring tools.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
8
The term "Big Data" is relative as it depends on the size of the using organization.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
9
Current total storage capacity lags behind the digital information being generated in the world.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
10
In the opening vignette,the Access Telecom (AT),built a system to better visualize customers who were unhappy before they canceled their service.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
11
MapReduce can be easily understood by skilled programmers due to its procedural nature.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
12
Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
13
Big Data uses commodity hardware,which is expensive,specialized hardware that is custom built for a client or application.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
14
In most cases,Hadoop is used to replace data warehouses.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
15
It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
16
The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
17
For low latency,interactive reports,a data warehouse is preferable to Hadoop.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
18
In Application Case 7.6,Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse,it was found that urban individuals have a higher number of diagnosed disease conditions.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
19
If you have many flexible programming languages running in parallel,Hadoop is preferable to a data warehouse.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
20
Big Data is being driven by the exponential growth,availability,and use of information.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
21
In the Twitter case study,how did influential users support their tweets?

A)opinion
B)objective data
C)multiple posts
D)references to other users
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
22
Data flows can be highly inconsistent,with periodic peaks,making data loads hard to manage.What is this feature of Big Data called?

A)volatility
B)periodicity
C)inconsistency
D)variability
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
23
Traditional data warehouses have not been able to keep up with

A)the evolution of the SQL language.
B)the variety and complexity of data.
C)expert systems that run on them.
D)OLAP.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
24
How does Hadoop work?

A)It integrates Big Data into a whole so large data elements can be processed as a whole on one computer.
B)It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers.
C)It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer.
D)It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
25
In the financial services industry,Big Data can be used to improve

A)regulatory oversight.
B)decision making.
C)customer service.
D)both A & B.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
26
A newly popular unit of data in the Big Data era is the petabyte (PB),which is

A)109 bytes.
B)1012 bytes.
C)1015 bytes.
D)1018 bytes.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
27
In a network analysis,what connects nodes?

A)edges
B)metrics
C)paths
D)visualizations
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
28
In a Hadoop "stack," what is a slave node?

A)a node where bits of programs are stored
B)a node where metadata is stored and used to organize data processing
C)a node where data is stored and processed
D)a node responsible for holding all the source programs
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
29
Which Big Data approach promotes efficiency,lower cost,and better performance by processing jobs in a shared,centrally managed pool of IT resources?

A)in-memory analytics
B)in-database analytics
C)grid computing
D)appliances
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
30
Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

A)ANSI 2003 SQL compliance is required
B)online archives alternative to tape
C)unrestricted,ungoverned sandbox explorations
D)analysis of provisional data
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
31
Which of the following sources is likely to produce Big Data the fastest?

A)order entry clerks
B)cashiers
C)RFID tags
D)online customers
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
32
Companies with the largest revenues from Big Data tend to be

A)the largest computer and IT services firms.
B)small computer and IT services firms.
C)pure open source Big Data firms.
D)non-U.S.Big Data firms.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
33
What is the Hadoop Distributed File System (HDFS)designed to handle?

A)unstructured and semistructured relational data
B)unstructured and semistructured non-relational data
C)structured and semistructured relational data
D)structured and semistructured non-relational data
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
34
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights.What is this process called?

A)in-memory analytics
B)in-database analytics
C)grid computing
D)appliances
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
35
What is Big Data's relationship to the cloud?

A)Hadoop cannot be deployed effectively in the cloud just yet.
B)Amazon and Google have working Hadoop cloud offerings.
C)IBM's homegrown Hadoop platform is the only option.
D)Only MapReduce works in the cloud; Hadoop does not.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
36
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is

A)easier with the advent of BI and Big Data.
B)essentially the same now as it has always been.
C)an increasingly challenging task for today's enterprises.
D)now completely automated with no human intervention required.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
37
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?

A)backup node
B)secondary node
C)substitute node
D)slave node
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
38
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study,what was the analytic goal?

A)determine if diseases are accurately diagnosed
B)determine probabilities of diseases that are comorbid
C)determine differences in rates of disease in urban and rural populations
D)determine differences in rates of disease in males v.females
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
39
In the Alternative Data for Market Analysis or Forecasts case study,satellite data was NOT used for

A)evaluating retail traffic.
B)monitoring activity at factories.
C)tracking agricultural estimates.
D)monitoring individual customer patterns.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
40
All of the following statements about MapReduce are true EXCEPT

A)MapReduce is a general-purpose execution engine.
B)MapReduce handles the complexities of network communication.
C)MapReduce handles parallel programming.
D)MapReduce runs without fault tolerance.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
41
Big Data comes from ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
42
________ bring together hardware and software in a physical unit that is not only fast but also scalable on an as-needed basis.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
43
Hadoop is primarily a(n)________ file system and lacks capabilities we'd associate with a DBMS,such as indexing,random access to data,and support for SQL.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
44
In the world of Big Data,________ aids organizations in processing and analyzing large volumes of multistructured data.Examples include indexing and search,graph analysis,etc.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
45
________ of data provides business value; pulling of data from multiple subject areas and numerous applications into one repository is the raison d'être for data warehouses.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
46
________ refers to the conformity to facts: accuracy,quality,truthfulness,or trustworthiness of the data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
47
Big Data employs ________ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
48
The problem of forecasting economic activity or microclimates based on a variety of data beyond the usual retail data is a very recent phenomenon and has led to another buzzword - ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
49
As volumes of Big Data arrive from multiple sources such as sensors,machines,social media,and clickstream interactions,the first step is to ________ all the data reliably and cost effectively.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
50
Organizations are working with data that meets the three V's-variety,volume,and ________ characterizations.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
51
HBase is a nonrelational ________ that allows for low-latency,quick lookups in Hadoop.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
52
________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
53
The ________ Node in a Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
54
In open-source databases,the most important performance enhancement to date is the cost-based ________.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
55
In-motion ________ is often overlooked today in the world of BI and Big Data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
56
The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than "small" data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
57
HBase,Cassandra,MongoDB,and Accumulo are examples of ________ databases.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
58
A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs,or the processing of the data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
59
As the size and the complexity of analytical systems increase,the need for more ________ analytical systems is also increasing to obtain the best performance.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
60
In the energy industry,________ grids are one of the most impactful applications of stream analytics.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
61
Define MapReduce.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
62
In the opening vignette,why was the Telecom company so concerned about the loss of customers,if customer churn is common in that industry?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
63
List and describe the three main "V"s that characterize Big Data.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
64
List and describe four of the most critical success factors for Big Data analytics.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
65
What are the differences between stream analytics and perpetual analytics? When would you use one or the other?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
66
What is NoSQL as used for Big Data? Describe its major downsides.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
67
Describe data stream mining and how it is used.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
68
When considering Big Data projects and architecture,list and describe five challenges designers should be mindful of in order to make the journey to analytics competency less stressful.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
69
Why are some portions of tape backup workloads being redirected to Hadoop clusters today?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 69 flashcards in this deck.