cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

broxvall
Journeyman III

Problem with too many fetches from GPU constant memory

Gives incorrect floatingpoint results for certain access patterns from constant memory

Hi,

I have noted a very weird error that occurs only when running OpenCL from an AMD GPU (5870, 5870 mobility and 6950 tested) and summing a large number of constant floating point values in the loop below (but less than the reported size of the constant memory) . When running against the CPU, the program generates correct floatingpoint values, when running against the GPU it generates incorrect floatingpoint values or NaN values (effect is consistent, but exact values given changes after reboots).

I belive this to be a bug in the OpenCL GPU implementation.

The bug only appears in APP SDK 2.2 and APP SDK 2.3, but not in APP SDK 2.1 and have been verified under 2 desktop machines and 1 laptop, all running 64-bit Ubuntu (10.04 and 10.10).

I have create a complete test case that is as small as possible and with C++/OpenCL sources and a Makefile for Linux to test it against here:

http://www.aass.oru.se/~mbl/testCase.tgz

Please also see the README file that describes the diffenent hardwares and software setups under which I tested it.

The OpenCL code that produces the error is as follows.

If anyone could take a look at this, and possibly configurm or deny of it is a bug with the drivers I would be much obliged.

thanks

/ Mathias Broxvall

__constant float f1r[7][7][8] = { .... }; ... __constant float f4r[7][7][8] = { .... }; __kernel void render(__global float *result) { int2 id = (int2)(get_global_id(0), get_global_id(1)); float4 res; int dz; res = (float4)(0.0f); for(dz=0;dz<3;dz++) { res += (float4)(f1r[dz][1][0],f2r[dz][1][0],f3r[dz][1][0],f4r[dz][1][0]); res += (float4)(f1r[dz][1][1],f2r[dz][1][1],f3r[dz][1][1],f4r[dz][1][1]); res += (float4)(f1r[dz][1][2],f2r[dz][1][2],f3r[dz][1][2],f4r[dz][1][2]); res += (float4)(f1r[dz][1][3],f2r[dz][1][3],f3r[dz][1][3],f4r[dz][1][3]); } vstore4(res, id.x+id.y*WIDTH, result); }

0 Likes
6 Replies
broxvall
Journeyman III

Hi,

hate to be replying to myself, but can anyone suggest me someone at AMD/ATI to contact to know that the bug will be atleast looked at?

I would very much like to see this it fixed for the upcoming 2.4 release. For now I have a a workaround by storing the constants in a cached 2D image instead - although it seems like quite a waste of bandwidth to use global memory instead of constant memory for this.

/ M

0 Likes

broxvall,
Thanks for the bug report, we are looking into it.
0 Likes

broxvall,
quick question. Is SDL required for this test case?
0 Likes

broxvall,
I've fixed the issue, and it will be in the next SDK release.
0 Likes

Thank you, that was very fast!

I assume that my answer that SDL and some of the other compiler Makefile flags (ie. openmp and some SSE stuff) wasn't needed to replicate the bug is no longer needed then.

cheers

0 Likes

Thanks for the answer from Micahvillmow.

That is pretty helpful.

0 Likes