Translating per host does work for some of the hosts. However, I could not get reports for many hosts because AMDuProfCLI hung. Here is a stack trace from one of the hung jobs at 62 minutes into. The runs that worked finished in about 45 minutes.
(gdb) where
#0 0x000014cbe43ded20 in __pthread_clockjoin_ex () from /lib64/libpthread.so.0
#1 0x000014cbe3e77cc7 in std::thread::join() () from /usr/lib64/libstdc++.so.6
#2 0x000014cbe5595f0e in std::deque<std::unique_ptr<std::thread, void (*)(std::thread*)>, std::allocator<std::unique_ptr<std::thread, void (*)(std::thread*)> > >::_M_destroy_data_aux(std::_Deque_iterator<std::unique_ptr<std::thread, void (*)(std::thread*)>, std::unique_ptr<std::thread, void (*)(std::thread*)>&, std::unique_ptr<std::thread, void (*)(std::thread*)>*>, std::_Deque_iterator<std::unique_ptr<std::thread, void (*)(std::thread*)>, std::unique_ptr<std::thread, void (*)(std::thread*)>&, std::unique_ptr<std::thread, void (*)(std::thread*)>*>) () from /u/daniel.kokron/play/AMDuProf/AMDuProf_Linux_x64_4.0.341/bin/libAMDProfileCommon.so
#3 0x000014cbe558ea9a in AMDTProfileManager::JoinAllCollectionThreads() () from /u/daniel.kokron/play/AMDuProf/AMDuProf_Linux_x64_4.0.341/bin/libAMDProfileCommon.so
#4 0x000014cbe558eb19 in AMDTProfileManager::WaitForCompletion() () from /u/daniel.kokron/play/AMDuProf/AMDuProf_Linux_x64_4.0.341/bin/libAMDProfileCommon.so
#5 0x0000000000414616 in HandleTranslateCommand(AMDTProcessArgs&) ()
#6 0x000000000041496b in HandleReportCommand(AMDTProcessArgs&) ()
#7 0x000000000040f949 in main ()