Server Gurus Discussions

soujanyabargavi · ‎06-19-2018

I'm new to hadoop and this is probably a stupid question but I've been looking for it for hours and cannot find how to do it.

I'm running Hadoop MapReduce with a different number of mappers and reducers to see the difference in performance (e.g. execution time). I want to check if the specified number of mappers/reducers were used but I just can't figure out how I do it.

Hadoop 1.2.1 is installed on a quad-core machine with hyper-threading and I'm sshing to the server, and MindMajix Hadoop is running in Pseudo-distributed mode.

My MapReduce program was written in Python, so I'm using hadoop-streaming, and this is how I ran the MR program.

$ hadoop jar /Users/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -file /Users/hadoop/map.py -mapper /Users/hadoop/map.py -file /Users/hadoop/reduce.py -reducer /Users/hadoop/reduce.py -input file:///Users/hadoop/inputfile -output file:///Users/hadoop/outputfile

I want to see log information that looks like this, or anything that provides this kind of information.

Anonymous · ‎06-21-2018

Hi soujanyabargavi,

To get the logs from the job you will need to figure out the job_id. This can be found by running 'hadoop job -list all" and searching for your particular job.

Once you have the job_id, you can run 'hadoop job -status <job_id>' will get the overall job status and several counters. These counters include the number of map tasks, number of reduce tasks, job state, percent complete, etc.

gulatisneha56 · ‎01-20-2019

I agree with you. Great information regarding Hadoop.

Thank you.

Server Gurus Discussions

hadoop: tracking MapReduce tasks