cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

timchist
Elite

AMD 79xx GPUs skip kernel execution for certain indices

I'm experiencing a strange problem that occurs on 7950 and 7970 cards, but does not happen on 5850 and 6870.

My application processes images in tiles. For each tile a series of OpenCL kernels is called. When tile size becomes relatively small (say, 128x128), some parts of output image may be not fully processed. I simplified my algorithm so that it is only executing the following operations for each tile:

  1. Temp1 = 100
  2. Temp2 = 30
  3. Temp1 = Temp1 + Temp2
  4. Dst = Src + Temp1

(Temp1, Temp2, Src, Dst are all vectors of 128x128).

After that I call clFinish and copy Temp1, Temp2, Src and Dst to host memory for checking. For those tiles that have been calculated incorrectly, I have found out that:

  • Temp1 is equal to 130 for all vector components
  • Temp2 is equal to 30 for all vector components
  • Dst is not equal to Src + Temp1 (Src + 130) for some vector components, but is rather equal to Src + 100

The number of incorrect vector components is often (but not always) divisible by 64, so it seems that under some circumstances whole wavefronts get skipped.

Even though the problem is 100% reproducible in this simplified version of our application, it does not show up when I try to write a standalone test, even when it very accurately models the behaviour of the application. Apparently there are some other factors that trigger the problem that I'm not aware of.

I'm attaching a screenshot showing a fragment of the output from our application. Grid indicates the tile boundary. If the output was correct, all the image would be equally pink, without any stripes.

The larger tiles become, the less is the likelihood of the problem to appear.

My best uneducated guess is that something wrong is happening when kernels are scheduled to hardware either on driver or on firmware level.

I tried several versions of driver, specifically: Catalyst 12.4, 12.8, 12.10, 13.1, 13.3 beta, 13.4. I also tried two different 7970s in two computers (one based on AMD FX 8350, the other one with i7 3770K). I also tried a 7950 in a compute based with i7 3930K. On all computers Windows 7 x64 was used. We did not check that under Linux or Mac OS. In all these configurations the problem did occur.

Does that ring a bell?

0 Likes
1 Solution
timchist
Elite

The problem is no longer reproducible with the Catalyst 13.6 beta driver. Apparently, something has been fixed. Thank you all for your help.

View solution in original post

0 Likes
48 Replies

Hi tim,

Can you confirm events are not able to fix your issue? You should be able to force required kernel scheduling using events.

0 Likes

Confirmed. Returning an event from launches 1, 2, 3 with subsequent waiting for that event in following launches 2, 3, 4 does not fix the issue. I also tried to enqueue barriers (clEnqueueBarrier/clEnqueueBarrierWithWaitList) with no success. Inserting clFlush or clFinish between steps 3 and 4, however, does fix the problem (with a performance penalty though).

0 Likes

Can you confirm if enabling profiling fixes the problem?  -- Then, we can be sure this is related to concurrent kernel execution.

Also, Can you confirm if this behaviour is seen on other OSes as well?

0 Likes

The problem exists both when profiling is enabled and disabled. Unfortunately, I don't have access to computers running Linux or Mac OS that have a 79xx GPU.

0 Likes

Oh well... Then this means that the kernels are not executing concurrently...and still you are facing problems.

Hmm.....

I see following potential reasons for this bug:

1) Software issue with your program (well...Just to cover all cases....)

2) A catalyst driver bug that runs kernels concurrently even if profiling is enabled.

3) A failed OS primitive that Catalyst driver is relying on (thats the reason I want you to check on other OSes) - which is causing concurrent execution when it should not.

     If not linux, Can you try with Win8?

     The fact that it does not occur on 5xxx and 6xxx need not necessarily imply a hardware issue.

     7xxx cards are much faster and the quickness can induce races or timing issues in other software.

4) Bug in dependency checks in Catalyst driver

      However, if thats the case -- enabling profiling should have fixed the issue.

      So, I dont think the bug is lurking here.... but as I said before... just to cover all cases.

5) Hardware bug  (Well....just to cover all cases)

Please post a smallest reproducible test-case. Otherwise, I fear we will go nowhere in this thread....

0 Likes

Thanks Himanshu. Good summary. I understand your point. I'll try to run the test on a Linux machine and see what's happening. I'll also seek a possibility to test on Windows 8.

0 Likes
timchist
Elite

The problem is no longer reproducible with the Catalyst 13.6 beta driver. Apparently, something has been fixed. Thank you all for your help.

0 Likes

good to know that. Thanks for informing us.

0 Likes

Thanks for coming back on this. Thanks for your time,

- Bruhaspati

0 Likes