Multiple Choice
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the following actions: 1. Group the individual images into a set of larger files 2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming. Which data serialization system gives the flexibility to do this?
A) CSV
B) XML
C) HTML
D) Avro
E) SequenceFiles
F) JSON
Correct Answer:

Verified
Correct Answer:
Verified
Q20: Your cluster implements HDFS High Availability (HA).
Q21: Choose three reasons why should you run
Q22: You have installed a cluster HDFS and
Q23: You have recently converted your Hadoop cluster
Q24: Which three basic configuration parameters must you
Q25: You are configuring your cluster to run
Q26: Given: You want to clean up
Q28: You are running Hadoop cluster with all monitoring
Q29: Assume you have a file named foo.txt
Q30: You are planning a Hadoop cluster and