I'm experiencing a strange problem that occurs on 7950 and 7970 cards, but does not happen on 5850 and 6870.
My application processes images in tiles. For each tile a series of OpenCL kernels is called. When tile size becomes relatively small (say, 128x128), some parts of output image may be not fully processed. I simplified my algorithm so that it is only executing the following operations for each tile:
- Temp1 = 100
- Temp2 = 30
- Temp1 = Temp1 + Temp2
- Dst = Src + Temp1
(Temp1, Temp2, Src, Dst are all vectors of 128x128).
After that I call clFinish and copy Temp1, Temp2, Src and Dst to host memory for checking. For those tiles that have been calculated incorrectly, I have found out that:
- Temp1 is equal to 130 for all vector components
- Temp2 is equal to 30 for all vector components
- Dst is not equal to Src + Temp1 (Src + 130) for some vector components, but is rather equal to Src + 100
The number of incorrect vector components is often (but not always) divisible by 64, so it seems that under some circumstances whole wavefronts get skipped.
Even though the problem is 100% reproducible in this simplified version of our application, it does not show up when I try to write a standalone test, even when it very accurately models the behaviour of the application. Apparently there are some other factors that trigger the problem that I'm not aware of.
I'm attaching a screenshot showing a fragment of the output from our application. Grid indicates the tile boundary. If the output was correct, all the image would be equally pink, without any stripes.
The larger tiles become, the less is the likelihood of the problem to appear.
My best uneducated guess is that something wrong is happening when kernels are scheduled to hardware either on driver or on firmware level.
I tried several versions of driver, specifically: Catalyst 12.4, 12.8, 12.10, 13.1, 13.3 beta, 13.4. I also tried two different 7970s in two computers (one based on AMD FX 8350, the other one with i7 3770K). I also tried a 7950 in a compute based with i7 3930K. On all computers Windows 7 x64 was used. We did not check that under Linux or Mac OS. In all these configurations the problem did occur.
Does that ring a bell?
7970bug2_cropped.png 53.4 KB