CCD-470 - Cloudera Certified Developer for Apache Hadoop CDH4 Upgrade Exam (CCDH)

Go back to Cloudera

Example Questions

During the standard sort and shuffle phase of MapReduce, keys and values are passed to reducers. Which of the following is true? Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation. Table metadata in Hive is: Combiners Increase the efficiency of a MapReduce program because: In the standard word count MapReduce algorithm, why might using a combiner reduce the overall Job running time? Which process describes the lifecycle of a Mapper? In a large MapReduce job with m mappers and r reducers, how many distinct copy operations will there be in the sort/shuffle phase? How does the NameNode detect that a DataNode has failed? You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime? Determine which best describes when the reduce method is first called in a MapReduce job? You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, you ve decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface. Indentify which invocation correctly with a value of Example to Hadoop? What happens in a MapReduce job when you set the number of reducers to zero? Which MapReduce daemon runs on each slave node and participates in job execution? What is the preferred way to pass a small number of configuration parameters to a mapper or reducer? Your cluster has 10 DataNodes, each with a single 1 TB hard drive. You utilize all your disk capacity for HDFS, reserving none for MapReduce. You implement default replication settings. What is the storage capacity of your Hadoop cluster (assuming no compression)? Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS. Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer? All keys used for intermediate output from mappers must: What is a Writable? You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks? In the reducer, the MapReduce API provides you with an iterator over Writable values. What does calling the next () method return? Workflows expressed in Oozie can contain: You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses TextInputFormat: the mapper applies a regular expression over input values and emits key-values pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reduces to one and settings the number of reducers to zero. You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper s map method? Your cluster s HDFS block size in 64MB. You have directory containing 100 plain text files, each of which is 100MB in size. The InputFormat for your job is TextInputFormat.Determine how many Mappers will run? In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies? A combiner reduces: Given a Mapper, Reducer, and Driver class packaged into a jar, which is the correct way of submitting the job to the cluster? What is the standard configuration of slave nodes in a Hadoop cluster? Which statement best describes the data path of intermediate key-value pairs (i.e., output of the mappers)? You want to populate an associative array in order to perform a map-side join. You ?v decided to put this information in a text file, place that file into the Distributed Cache and read it in your Mapper before any records are processed. Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array? How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce? The NameNode uses RAM for the following purpose: In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values? Which of the following best describes the workings of TextInputFormat? Assuming default settings, which best describes the order of data provided to a reducer s reduce method: What is the behavior of the default partitioner? You use the hadoop fs put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this life? You are developing a MapReduce job for sales reporting. The mapper will process input keys representing the year (IntWritable) and input values representing product indentifies (Text). Indentify what determines the data types used by the Mapper for a given job. You are running a job that will process a single InputSplit on a cluster which has no other jobs currently running. Each node has an equal number of open Map slots. On which node will Hadoop first attempt to run the Map task? Custom programmer-defined counters in MapReduce are: Which happens if the NameNode crashes? You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement? The Hadoop framework provides a mechanism for coping with machine issues such as faulty configuration or impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. All the tasks run simultaneously and the task finish first are used. This is called: You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory? To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this? You need to create a GUI application to help your company's sales people add and edit customer information. Would HDFS be appropriate for this customer information file? For each input key-value pair, mappers can emit: Which best describes how TextInputFormat processes input files and line breaks? Which of the following statements most accurately describes the relationship between MapReduce and Pig?