cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

aaa
Journeyman III

Bug of Catalyst drivers of version 12.10 and later running with Radeon HD 7000 series cards

An intricate bug has been found in the latest Catalyst drivers when running OpenCL kernels with constant indexes accessing the __local memory.

Conditions: The bug only emerges in each group when get_local_id(0) == 0 and get_local_id(1) == 0 or 14 in a (16, 16, 1) dimension grid.

Example:

__local float s_srcPatch[10][10];

__local float s_dstPatch[20][16];

float sum;

sum =       (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx - 2) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx - 1) / 2)];

sum = sum + (evenFlag * 0.375f ) * s_srcPatch[0][1 + ((tidx    ) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[0][1 + ((tidx + 1) / 2)];

sum = sum + (evenFlag * 0.0625f) * s_srcPatch[0][1 + ((tidx + 2) / 2)]; // Operation 5

s_dstPatch = sum;

In the case above, the operation 5 will be ignored. The final sum is incorrect.

__local float s_srcPatch[10][10];

__local float s_dstPatch[20][16];

float sum;

sum =       (evenFlag * 0.0625f) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx - 2) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx - 1) / 2)];

sum = sum + (evenFlag * 0.375f ) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx    ) / 2)];

sum = sum + ( oddFlag * 0.25f  ) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx + 1) / 2)];

sum = sum + (evenFlag * 0.0625f) * s_srcPatch[get_local_size(1) - 16][1 + ((tidx + 2) / 2)]; // get_local_size(1) == 16 which take the same effect as above code segment

s_dstPatch = sum;

If I use an expression "get_local_size(1) - 16", everything goes fine. I can get the correct sum.

0 Likes
10 Replies
himanshu_gautam
Grandmaster

It will be easy to reproduce the issue, if you can give a minimal testcase, which i can compile and execute here. Better attach a zip file instead of spilling the code directly.

As i understand from the snippet above, index 0 is failing and index = get_local_size(1) - 16 is passing. That's interesting!!

Please furnish following information too: CPU, GPU, APP SDK version, Catalyst Driver (Have you tried with 13.1?), Operating system, Bitness of OS.

0 Likes

Himanshu

Thanks for your reply. My machine configuration is AMD 1055T or Intel i7 3960x CPU (actually no matter what model it is), Radeon HD 7970 or Radeon HD 7850 (which I used), APP SDK 2.7 or 2.8 and with Windows 7 x64 OS. Besides, catalyst driver 13.1 acts the same way. I'm quite busy these days. I'm gonna furnish the minimal sample soon as I get my hands off my duties.

0 Likes

Himanshu

I've finished a simple sample based on the APP SDK sample that can repro this bug. Attached the sample code and you can run it and debug it by putting the project under the AMD APP sample path.

0 Likes

I went through the results in "CorrectResults.txt" and found a "failed" message in the bottom. So, how did you generate the results.

What exactly do you think is wrong? Can you please highlight the entry containing wrong value?

it

There are 4 workgroups working on 2 dimensions. We can call them LU (left-up), RU (right-up), LB (left bottom), RB (right bottom)

In the result, the first row (32 elements) are made of results from both LU and RU (16 from LU and 16 from RU). Similarly for other rows.

I see a result deviation happening from the 16th row in the destination - which is understandable because it is being populated by LB and RB (starting from 16th row).

Please hightlight what is the wrong entry in your result file. It will be helpful. Thanks,

0 Likes

Sorry, the "failed" word in correct result is printed by the verify function in the code which is unexpected output, just ignore it.

I attached a new file which contains the incorrectresult.txt and a spreadsheet including both the correct and incorrect results. Within the spreadsheet, I highlighted the inconsistent results with red. You can check it. The two results are collected with the macro definition in the cl file. Please check it for reference. BYW, you can get the same results just building and running the project I provided. If you still have questions, feel free to ask.

0 Likes

Thanks! I am looking into it now.

0 Likes

I just tested on 12.10 driver (clinfo reports 1016.xx) on Cayman (6950). I got correct results.

Do you think this issue is specific to 7xxx card on Windows?

I have a 7xxx card with me on Linux box now.

I will see if your code can be ported simply to Linux....without major changes.

Otherwise, I would need a repro on Linux from you.

0 Likes

Yes. Radeon HD 6000 series cards work as expected. Only Radeon HD 7000 series cards are involved. I haven't tested it on Linux personally. The project I attached follows the rules of AMD APP SDK which is available on both Windows and Linux. There is a makefile within the project. I haven't modified it to adapt to my change. But simply change the CLFILES = Template_Kernels.cl to CLFILES = pyr_up.cl, I believe it'll work.

0 Likes

I was able to reproduce this issue on a HD 7xxx device. I have forwarded it to AMD Engg team. Thanks for your support.

0 Likes

This issue has been fixed in the latest driver.  Can you verify and confirm?

0 Likes