cancel
Showing results for 
Search instead for 
Did you mean: 

Server Processors

soujanyabargavi
Journeyman III

hadoop: tracking MapReduce tasks

I'm new to hadoop and this is probably a stupid question but I've been looking for it for hours and cannot find how to do it.

I'm running Hadoop MapReduce with a different number of mappers and reducers to see the difference in performance (e.g. execution time). I want to check if the specified number of mappers/reducers were used but I just can't figure out how I do it.

Hadoop 1.2.1 is installed on a quad-core machine with hyper-threading and I'm sshing to the server, and MindMajix Hadoop is running in Pseudo-distributed mode.

My MapReduce program was written in Python, so I'm using hadoop-streaming, and this is how I ran the MR program.

$ hadoop jar /Users/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -file /Users/hadoop/map.py -mapper /Users/hadoop/map.py -file /Users/hadoop/reduce.py -reducer /Users/hadoop/reduce.py -input file:///Users/hadoop/inputfile -output file:///Users/hadoop/outputfile

I want to see log information that looks like this, or anything that provides this kind of information.

0 Likes
2 Replies
Anonymous
Not applicable

Hi soujanyabargavi,

To get the logs from the job you will need to figure out the job_id. This can be found by running 'hadoop job -list all" and searching for your particular job.

Once you have the job_id, you can run 'hadoop job -status <job_id>' will get the overall job status and several counters. These counters include the number of map tasks, number of reduce tasks, job state, percent complete, etc.

I agree with you. Great information regarding Hadoop.

Thank you.

0 Likes