1 Reply Latest reply on Jun 21, 2018 3:46 PM by jesse_amd

    hadoop: tracking MapReduce tasks

    soujanyabargavi

      I'm new to hadoop and this is probably a stupid question but I've been looking for it for hours and cannot find how to do it.

      I'm running Hadoop MapReduce with a different number of mappers and reducers to see the difference in performance (e.g. execution time). I want to check if the specified number of mappers/reducers were used but I just can't figure out how I do it.

      Hadoop 1.2.1 is installed on a quad-core machine with hyper-threading and I'm sshing to the server, and MindMajix Hadoop is running in Pseudo-distributed mode.

      My MapReduce program was written in Python, so I'm using hadoop-streaming, and this is how I ran the MR program.

      $ hadoop jar /Users/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -file /Users/hadoop/map.py -mapper /Users/hadoop/map.py -file /Users/hadoop/reduce.py -reducer /Users/hadoop/reduce.py -input file:///Users/hadoop/inputfile -output file:///Users/hadoop/outputfile

      I want to see log information that looks like this, or anything that provides this kind of information.

        • Re: hadoop: tracking MapReduce tasks
          jesse_amd

          Hi soujanyabargavi,

           

          To get the logs from the job you will need to figure out the job_id. This can be found by running 'hadoop job -list all" and searching for your particular job.

           

          Once you have the job_id, you can run 'hadoop job -status <job_id>' will get the overall job status and several counters. These counters include the number of map tasks, number of reduce tasks, job state, percent complete, etc.