There are 16 "bulldozer" cores in this box. This means you can run 32 threads concurrently. However, APP spawns 2 * numDevices threads for device management (copying data and such) and when using the CPU as a compute device it spawns 2 * numCores threads. This yields 73 (including the main thread) threads when running with everything in the system. I'm confused why it spawns 2 threads per core instead of 1, as unless one is blocking on a condition variable or sleeping, it's just going to contend for resources.
Since you haven't gotten very many responses on this forum, try:
* posting on the technology forum: http://forums.amd.com/game/categories.cfm?catid=433&entercat=y
* contact AMD support either by email (http://emailcustomercare.amd.com/) or by phone (http://support.amd.com/us/contacts/Pages/global-technical-support.aspx)
Let me know if you still don't get an answer to your question after contacting these 2 resources.