Question 1

Which best describes what the map method accepts and emits?

Accepted Answer

A)  It accepts a single key-value pair as input and emits a single key and list of corresponding values as output. 
B)  It accepts a single key-value pairs as input and can emit only one key-value pair as output. 
C)  It accepts a list key-value pairs as input and can emit only one key-value pair as output. 
D)  It accepts a single key-value pairs as input and can emit any number of key-value pair as output, including zero. 
A)  It accepts a single key-value pair as input and emits a single key and list of corresponding values as output. 
B)  It accepts a single key-value pairs as input and can emit only one key-value pair as output. 
C)  It accepts a list key-value pairs as input and can emit only one key-value pair as output. 
D)  It accepts a single key-value pairs as input and can emit any number of key-value pair as output, including zero. D

Question 2

What data does a Reducer reduce method process?

Accepted Answer

A)  All the data in a single input file. 
B)  All data produced by a single mapper. 
C)  All data for a given key, regardless of which mapper(s) produced it. 
D)  All data for a given value, regardless of which mapper(s) produced it. 
A)  All the data in a single input file. 
B)  All data produced by a single mapper. 
C)  All data for a given key, regardless of which mapper(s) produced it. 
D)  All data for a given value, regardless of which mapper(s) produced it. C

Question 3

Workflows expressed in Oozie can contain:

Accepted Answer

A)  Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins. 
B)  Sequences of MapReduce job only; on Pig on Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins. 
C)  Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks. 
D)  Iterntive repetition of MapReduce jobs until a desired answer or state is reached. 
A)  Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins. 
B)  Sequences of MapReduce job only; on Pig on Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins. 
C)  Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks. 
D)  Iterntive repetition of MapReduce jobs until a desired answer or state is reached. A

Question 4

Table metadata in Hive is:

Accepted Answer

A)  Stored as metadata on the NameNode. 
B)  Stored along with the data in HDFS. 
C)  Stored in the Metastore. 
D)  Stored in ZooKeeper. 
A)  Stored as metadata on the NameNode. 
B)  Stored along with the data in HDFS. 
C)  Stored in the Metastore. 
D)  Stored in ZooKeeper.

Question 5

You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?

Accepted Answer

A)  Processor and network I/O 
B)  Disk I/O and network I/O 
C)  Processor and RAM 
D)  Processor and disk I/O 
A)  Processor and network I/O 
B)  Disk I/O and network I/O 
C)  Processor and RAM 
D)  Processor and disk I/O

Question 6

What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

Accepted Answer

A)  You will not be able to compress the intermediate data. 
B)  You will longer be able to take advantage of a Combiner. 
C)  By using multiple reducers with the default HashPartitioner, output files may not be in globally sorted order. 
D)  There are no concerns with this approach. It is always advisable to use multiple reduces. 
A)  You will not be able to compress the intermediate data. 
B)  You will longer be able to take advantage of a Combiner. 
C)  By using multiple reducers with the default HashPartitioner, output files may not be in globally sorted order. 
D)  There are no concerns with this approach. It is always advisable to use multiple reduces.

Question 7

In the reducer, the MapReduce API provides you with an iterator over Writable values. What does calling the next () method return?

Accepted Answer

A)  It returns a reference to a different Writable object time. 
B)  It returns a reference to a Writable object from an object pool. 
C)  It returns a reference to the same Writable object each time, but populated with different data. 
D)  It returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object. 
E)  It returns a reference to the same Writable object if the next value is the same as the previous value, or a new Writable object otherwise. 
A)  It returns a reference to a different Writable object time. 
B)  It returns a reference to a Writable object from an object pool. 
C)  It returns a reference to the same Writable object each time, but populated with different data. 
D)  It returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object. 
E)  It returns a reference to the same Writable object if the next value is the same as the previous value, or a new Writable object otherwise.

Question 8

Which best describes how TextInputFormat processes input files and line breaks?

Accepted Answer

A)  Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line. 
B)  Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line. 
C)  The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines. 
D)  Input file splits may cross line breaks. A line that crosses file splits is ignored. 
E)  Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line. 
A)  Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line. 
B)  Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line. 
C)  The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines. 
D)  Input file splits may cross line breaks. A line that crosses file splits is ignored. 
E)  Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.

Question 9

In a MapReduce job with 500 map tasks, how many map task attempts will there be?

Accepted Answer

A)  It depends on the number of reduces in the job. 
B)  Between 500 and 1000. 
C)  At most 500. 
D)  At least 500. 
E)  Exactly 500. 
A)  It depends on the number of reduces in the job. 
B)  Between 500 and 1000. 
C)  At most 500. 
D)  At least 500. 
E)  Exactly 500.

Question 10

Analyze each scenario below and indentify which best describes the behavior of the default partitioner?

Accepted Answer

A)  The default partitioner assigns key-values pairs to reduces based on an internal random number generator. 
B)  The default partitioner implements a round-robin strategy, shuffling the key-value pairs to each reducer in turn. This ensures an event partition of the key space. 
C)  The default partitioner computers the hash of the key. Hash values between specific ranges are associated with different buckets, and each bucket is assigned to a specific reducer. 
D)  The default partitioner computers the hash of the key and divides that valule modulo the number of reducers. The result determines the reducer assigned to process the key-value pair. 
E)  The default partitioner computers the hash of the value and takes the mod of that value with the number of reducers. The result determines the reducer assigned to process the key-value pair. 
A)  The default partitioner assigns key-values pairs to reduces based on an internal random number generator. 
B)  The default partitioner implements a round-robin strategy, shuffling the key-value pairs to each reducer in turn. This ensures an event partition of the key space. 
C)  The default partitioner computers the hash of the key. Hash values between specific ranges are associated with different buckets, and each bucket is assigned to a specific reducer. 
D)  The default partitioner computers the hash of the key and divides that valule modulo the number of reducers. The result determines the reducer assigned to process the key-value pair. 
E)  The default partitioner computers the hash of the value and takes the mod of that value with the number of reducers. The result determines the reducer assigned to process the key-value pair.

Question 11

In a large MapReduce job with m mappers and n reducers, how many distinct copy operations will there be in the sort/shuffle phase?

Accepted Answer

A)  mXn (i.e., m multiplied by n) 
B)  n 
C)  m 
D)  m+n (i.e., m plus n) 
E)  m n (i.e., m to the power of n) (i.e., m to the power of n) 
A)  mXn (i.e., m multiplied by n) 
B)  n 
C)  m 
D)  m+n (i.e., m plus n) 
E)  m n (i.e., m to the power of n) (i.e., m to the power of n)

Question 12

The Hadoop framework provides a mechanism for coping with machine issues such as faulty configuration or impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. All the tasks run simultaneously and the task finish first are used. This is called:

Accepted Answer

A)  Combine 
B)  IdentityMapper 
C)  IdentityReducer 
D)  Default Partitioner 
E)  Speculative Execution 
A)  Combine 
B)  IdentityMapper 
C)  IdentityReducer 
D)  Default Partitioner 
E)  Speculative Execution

Question 13

For each input key-value pair, mappers can emit:

Accepted Answer

A)  As many intermediate key-value pairs as designed. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous). 
B)  As many intermediate key-value pairs as designed, but they cannot be of the same type as the input key-value pair. 
C)  One intermediate key-value pair, of a different type. 
D)  One intermediate key-value pair, but of the same type. 
E)  As many intermediate key-value pairs as designed, as long as all the keys have the same types and all the values have the same type. 
A)  As many intermediate key-value pairs as designed. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous). 
B)  As many intermediate key-value pairs as designed, but they cannot be of the same type as the input key-value pair. 
C)  One intermediate key-value pair, of a different type. 
D)  One intermediate key-value pair, but of the same type. 
E)  As many intermediate key-value pairs as designed, as long as all the keys have the same types and all the values have the same type.

Question 14

You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?

Accepted Answer

A)  SequenceFiles 
B)  Avro 
C)  JSON 
D)  HTML 
E)  XML
F) CSV 
A)  SequenceFiles 
B)  Avro 
C)  JSON 
D)  HTML 
E)  XML
F) CSV

Question 15

Given a directory of files with the following structure: line number, tab character, string: Example: 1    abialkjfjkaoasdfjksdlkjhqweroij 2    kadfjhuwqounahagtnbvaswslmnbfgy 3    kjfteiomndscxeqalkzhtopedkfsikj You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?

Accepted Answer

A)  SequenceFileAsTextInputFormat 
B)  SequenceFileInputFormat 
C)  KeyValueFileInputFormat 
D)  BDBInputFormat 
A)  SequenceFileAsTextInputFormat 
B)  SequenceFileInputFormat 
C)  KeyValueFileInputFormat 
D)  BDBInputFormat

Question 16

Which process describes the lifecycle of a Mapper?

Accepted Answer

A)  The JobTracker calls the TaskTracker's configure () method, then its map () method and finally its close () method. 
B)  The TaskTracker spawns a new Mapper to process all records in a single input split. 
C)  The TaskTracker spawns a new Mapper to process each key-value pair. 
D)  The JobTracker spawns a new Mapper to process all records in a single file. 
A)  The JobTracker calls the TaskTracker's configure () method, then its map () method and finally its close () method. 
B)  The TaskTracker spawns a new Mapper to process all records in a single input split. 
C)  The TaskTracker spawns a new Mapper to process each key-value pair. 
D)  The JobTracker spawns a new Mapper to process all records in a single file.

Question 17

You have written a Mapper which invokes the following five calls to the OutputColletor.collect method: output.collect (new Text ("Apple"), new Text ("Red") ) ; output.collect (new Text ("Banana"), new Text ("Yellow") ) ; output.collect (new Text ("Apple"), new Text ("Yellow") ) ; output.collect (new Text ("Cherry"), new Text ("Red") ) ; output.collect (new Text ("Apple"), new Text ("Green") ) ; How many times will the Reducer's reduce method be invoked?

Accepted Answer

A)  6 
B)  3 
C)  1 
D)  0 
E)  5 
A)  6 
B)  3 
C)  1 
D)  0 
E)  5

Question 18

Identify the MapReduce v2 (MRv2 / YARN) daemon responsible for launching application containers and monitoring application resource usage?

Accepted Answer

A)  ResourceManager 
B)  NodeManager 
C)  ApplicationMaster 
D)  ApplicationMasterService 
E)  TaskTracker
F) JobTracker 
A)  ResourceManager 
B)  NodeManager 
C)  ApplicationMaster 
D)  ApplicationMasterService 
E)  TaskTracker
F) JobTracker

Question 19

To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this?

Accepted Answer

A)  Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper. 
B)  Place the data file in the DistributedCache and read the data into memory in the map method of the mapper. 
C)  Place the data file in the DataCache and read the data into memory in the configure method of the mapper. 
D)  Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper. 
A)  Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper. 
B)  Place the data file in the DistributedCache and read the data into memory in the map method of the mapper. 
C)  Place the data file in the DataCache and read the data into memory in the configure method of the mapper. 
D)  Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper.

Question 20

What types of algorithms are difficult to express in MapReduce v1 (MRv1)?

Accepted Answer

A)  Algorithms that require applying the same mathematical function to large numbers of individual binary records. 
B)  Relational operations on large amounts of structured and semi-structured data. 
C)  Algorithms that require global, sharing states. 
D)  Large-scale graph algorithms that require one-step link traversal. 
E)  Text analysis algorithms on large collections of unstructured text (e.g, Web crawls). 
A)  Algorithms that require applying the same mathematical function to large numbers of individual binary records. 
B)  Relational operations on large amounts of structured and semi-structured data. 
C)  Algorithms that require global, sharing states. 
D)  Large-scale graph algorithms that require one-step link traversal. 
E)  Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).

Which best describes what the map method accepts and emits?

What data does a Reducer reduce method process?

Workflows expressed in Oozie can contain:

Table metadata in Hive is:

What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

In the reducer, the MapReduce API provides you with an iterator over Writable values. What does calling the next () method return?

Which best describes how TextInputFormat processes input files and line breaks?

In a MapReduce job with 500 map tasks, how many map task attempts will there be?

Analyze each scenario below and indentify which best describes the behavior of the default partitioner?

In a large MapReduce job with m mappers and n reducers, how many distinct copy operations will there be in the sort/shuffle phase?

For each input key-value pair, mappers can emit:

Which process describes the lifecycle of a Mapper?

Identify the MapReduce v2 (MRv2 / YARN) daemon responsible for launching application containers and monitoring application resource usage?

To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this?

What types of algorithms are difficult to express in MapReduce v1 (MRv1)?

Cloudera Certified Administrator for Apache Hadoop (CCAH)

Filters

Exam 2: Cloudera Certified Developer for Apache Hadoop (CCDH)

Which best describes what the map method accepts and emits?

What data does a Reducer reduce method process?

Workflows expressed in Oozie can contain:

Table metadata in Hive is:

What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

In the reducer, the MapReduce API provides you with an iterator over Writable values. What does calling the next () method return?

Which best describes how TextInputFormat processes input files and line breaks?

In a MapReduce job with 500 map tasks, how many map task attempts will there be?

Analyze each scenario below and indentify which best describes the behavior of the default partitioner?

In a large MapReduce job with m mappers and n reducers, how many distinct copy operations will there be in the sort/shuffle phase?

For each input key-value pair, mappers can emit:

Which process describes the lifecycle of a Mapper?

Identify the MapReduce v2 (MRv2 / YARN) daemon responsible for launching application containers and monitoring application resource usage?

To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this?

What types of algorithms are difficult to express in MapReduce v1 (MRv1)?

Cloudera Certified Administrator for Apache Hadoop (CCAH)

Filters