I'm new to hadoop and this is probably a stupid question but I've been looking for it for hours and cannot find how to do it.
I'm running Hadoop MapReduce with a different number of mappers and reducers to see the difference in performance (e.g. execution time). I want to check if the specified number of mappers/reducers were used but I just can't figure out how I do it.
Hadoop 1.2.1 is installed on a quad-core machine with hyper-threading and I'm sshing to the server, and MindMajix Hadoop is running in Pseudo-distributed mode.
My MapReduce program was written in Python, so I'm using hadoop-streaming, and this is how I ran the MR program.
$ hadoop jar /Users/hadoop/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -file /Users/hadoop/map.py -mapper /Users/hadoop/map.py -file /Users/hadoop/reduce.py -reducer /Users/hadoop/reduce.py -input file:///Users/hadoop/inputfile -output file:///Users/hadoop/outputfile
I want to see log information that looks like this, or anything that provides this kind of information.
To get the logs from the job you will need to figure out the job_id. This can be found by running 'hadoop job -list all" and searching for your particular job.
Once you have the job_id, you can run 'hadoop job -status <job_id>' will get the overall job status and several counters. These counters include the number of map tasks, number of reduce tasks, job state, percent complete, etc.