cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

rick_weber
Adept II

Weird heterogeneous performance issues

My raytracer supports hybrid GPU-CPU execution.

When running with only the CPU, I can vary the number of threads using the CPU_MAX_COMPUTE_UNITS variable. I see that performance is scalableish using up to 32 Interlagos 6272 cores. At 16 cores, I see a 10x speedup, at 32 I see 19x.

The machine I'm running this on has 32 6272 cores and 3 Radeon 7970s. On the 7970s, performance scales virtually linearly with the number of GPUs.

When I hybridize the resources, I see modest performance improvement when using 16 cores + the 3 GPUs over just the 3 GPUs. However, the moment I increase CPU_MAX_COMPUTE_UNITS to 17 or higher, performance drops like a colorful euphemism and the overall algorithm is significantly slower than using just the 3 GPUs.

I'm trying to account for why this might be the case. It appears for every CPU core, APP spawns 2 + 2 threads and another 2 for every GPU. Thus, with 3 GPUs and 32 cores, I'm seeing 73 threads. I suspect I'm getting contention issues that delay the GPUs from executing. How would I go about checking this? Also, why are there 2 threads per core instead of 1? I can understand having a few extra for all the cores to do asynchronous copies and such, but 2 per core sounds excessive.

0 Kudos
Reply
3 Replies
irving12
Journeyman III

Re: Weird heterogeneous performance issues

2 cores is not excessive, are looking bad I recommend searching the features of your cpu so you know how much you can bear.  como hackear twitter Lanzadores de xploits

0 Kudos
Reply
rick_weber
Adept II

Re: Weird heterogeneous performance issues

There are 16 "bulldozer" cores in this box. This means you can run 32 threads concurrently. However, APP spawns 2 * numDevices threads for device management (copying data and such) and when using the CPU as a compute device it spawns 2 * numCores threads. This yields 73 (including the main thread) threads when running with everything in the system. I'm confused why it spawns 2 threads per core instead of 1, as unless one is blocking on a condition variable or sleeping, it's just going to contend for resources.

0 Kudos
Reply
kcarney
Staff
Staff

Re: Weird heterogeneous performance issues

Hello,

Since you haven't gotten very many responses on this forum, try:

* posting on the technology forum: http://forums.amd.com/game/categories.cfm?catid=433&entercat=y

and/or

* contact AMD support either by email (http://emailcustomercare.amd.com/) or by phone (http://support.amd.com/us/contacts/Pages/global-technical-support.aspx)

Let me know if you still don't get an answer to your question after contacting these 2 resources.

Cheers!

Kristen

0 Kudos
Reply