cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

timchist
Elite

AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

I'm experiencing a strange problem that occurs on 7950 and 7970 cards, but does not happen on 5850 and 6870.

My application processes images in tiles. For each tile a series of OpenCL kernels is called. When tile size becomes relatively small (say, 128x128), some parts of output image may be not fully processed. I simplified my algorithm so that it is only executing the following operations for each tile:

  1. Temp1 = 100
  2. Temp2 = 30
  3. Temp1 = Temp1 + Temp2
  4. Dst = Src + Temp1

(Temp1, Temp2, Src, Dst are all vectors of 128x128).

After that I call clFinish and copy Temp1, Temp2, Src and Dst to host memory for checking. For those tiles that have been calculated incorrectly, I have found out that:

  • Temp1 is equal to 130 for all vector components
  • Temp2 is equal to 30 for all vector components
  • Dst is not equal to Src + Temp1 (Src + 130) for some vector components, but is rather equal to Src + 100

The number of incorrect vector components is often (but not always) divisible by 64, so it seems that under some circumstances whole wavefronts get skipped.

Even though the problem is 100% reproducible in this simplified version of our application, it does not show up when I try to write a standalone test, even when it very accurately models the behaviour of the application. Apparently there are some other factors that trigger the problem that I'm not aware of.

I'm attaching a screenshot showing a fragment of the output from our application. Grid indicates the tile boundary. If the output was correct, all the image would be equally pink, without any stripes.

The larger tiles become, the less is the likelihood of the problem to appear.

My best uneducated guess is that something wrong is happening when kernels are scheduled to hardware either on driver or on firmware level.

I tried several versions of driver, specifically: Catalyst 12.4, 12.8, 12.10, 13.1, 13.3 beta, 13.4. I also tried two different 7970s in two computers (one based on AMD FX 8350, the other one with i7 3770K). I also tried a 7950 in a compute based with i7 3930K. On all computers Windows 7 x64 was used. We did not check that under Linux or Mac OS. In all these configurations the problem did occur.

Does that ring a bell?

0 Likes
1 Solution

Accepted Solutions
timchist
Elite

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

The problem is no longer reproducible with the Catalyst 13.6 beta driver. Apparently, something has been fixed. Thank you all for your help.

View solution in original post

0 Likes
48 Replies
vmiura
Adept II

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

Are you using any complex control flow?


I ran into 2 bugs that I could reproduce on 13.3 beta:

while(a && b) {} // loops even when b is false

do
{

   store some debug

  if(a) return;  // <- having a return inside do while loop caused register clobbering, and weird data was stored to my debug buffer
}while(b);

0 Likes
timchist
Elite

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

In the full version we may have some non-trivial flow control operators.

However, as I wrote in the post, in the "simplified" version we are only using very simple kernels, such as Add or Memset, that only have a single if inside:

if(x < size)

{

...

}

0 Likes
timchist
Elite

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

So far I have two possible reasons of why this problem may occur:

  • as task size is small and not all compute units are utilised, GPU may attempt to schedule next kernel to free compute units while the previous kernel is still not finished. This may be caused by an error in dependency analysis
  • cache coherency problem: second call to Add (Dst = Src + Temp1) is executed on the compute unit that has previously executed the first Fill with 100 (Temp1 = 100) and for some reason the cache of this compute unit did not get updated with a subsequent value of 130 (after Temp1 = Temp1 + Temp2 was executed)
0 Likes
timchist
Elite

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

I have just got a confirmation that the behaviour I experience is caused by executing two kernels that have dependencies in parallel. Please see two attached screenshots, one showing timeline from a correct tile, the other one -- from an incorrect tile.

As you can see, for a correct tile GPU executes Temp1 = 100 and Temp2 = 30 in parallel. That's ok, there is no dependencies. Temp1 = Temp1 + Temp2 and Dst = Src + Temp1 are executed sequentially, as the first kernel modifies Temp1, so the second one depends on the results of the first one.

For a tile that is calculated incorrectly the timeline is different: Temp1 = 100 and Temp2 = 30 are executed sequentially, but two Add calls are incorrectly executed in parallel.

Is there a workaround?

0 Likes
vmiura
Adept II

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

Is it an asynchronous queue?

Technically you should execute clEnqueueBarrier() or clEnqueueMarker() beween kernels if you don't want them to execute in parallel, although I thought that the current drivers don't support asynchronous execution.

0 Likes
timchist
Elite

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

No, this queue is synchronous. In addition, as far as I know AMD OpenCL does not support asynchronous queues.

In synchronous queues kernels MUST execute sequentially without executing clEnqueueBarrier, clEnqueueMarker or any other explicit synchronization points ("CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE  Determines whether the commands queued in the command-queue are executed in-order or out-of-order. If set, the commands in the command-queue are executed out-of-order. Otherwise, commands are executed in-order.").

I'd say it's OK to execute commands in parallel even in a synchronous queue, but only if there is 100% no dependencies between them. Which is not true in my case.

0 Likes
vmiura
Adept II

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

Yeah, it shouldn't need extra synchronization.

I've seen some unexpected results overlapping in CodeXL kernel tracing though, so I'm not sure you can trust that they are actually overlapping.

Do you get the same bug if you use clFinish() to force sync between the kernel dispatches?

0 Likes
timchist
Elite

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

No, inserting clFinish helps to avoid the errors, but with a performance penalty of ~30%.

0 Likes
himanshu_gautam
Grandmaster

Re: AMD 79xx GPUs skip kernel execution for certain indices

Jump to solution

Can you post a small repro case so that we can take this issue up?

0 Likes