6 Replies Latest reply on Mar 31, 2011 11:30 PM by kiddoman

    Problem with too many fetches from GPU constant memory

    broxvall
      Gives incorrect floatingpoint results for certain access patterns from constant memory

      Hi,

      I have noted a very weird error that occurs only when running OpenCL from an AMD GPU (5870, 5870 mobility and 6950 tested) and summing a large number of constant floating point values in the loop below (but less than the reported size of the constant memory) . When running against the CPU, the program generates correct floatingpoint values, when running against the GPU it generates incorrect floatingpoint values or NaN values (effect is consistent, but exact values given changes after reboots).

      I belive this to be a bug in the OpenCL GPU implementation.

      The bug only appears in APP SDK 2.2 and APP SDK 2.3, but not in APP SDK 2.1 and have been verified under 2 desktop machines and 1 laptop, all running 64-bit Ubuntu (10.04 and 10.10).

      I have create a complete test case that is as small as possible and with C++/OpenCL sources and a Makefile for Linux to test it against here:

      http://www.aass.oru.se/~mbl/testCase.tgz

      Please also see the README file that describes the diffenent hardwares and software setups under which I tested it.

      The OpenCL code that produces the error is as follows.

      If anyone could take a look at this, and possibly configurm or deny of it is a bug with the drivers I would be much obliged.

      thanks

      / Mathias Broxvall

      __constant float f1r[7][7][8] = { .... }; ... __constant float f4r[7][7][8] = { .... }; __kernel void render(__global float *result) { int2 id = (int2)(get_global_id(0), get_global_id(1)); float4 res; int dz; res = (float4)(0.0f); for(dz=0;dz<3;dz++) { res += (float4)(f1r[dz][1][0],f2r[dz][1][0],f3r[dz][1][0],f4r[dz][1][0]); res += (float4)(f1r[dz][1][1],f2r[dz][1][1],f3r[dz][1][1],f4r[dz][1][1]); res += (float4)(f1r[dz][1][2],f2r[dz][1][2],f3r[dz][1][2],f4r[dz][1][2]); res += (float4)(f1r[dz][1][3],f2r[dz][1][3],f3r[dz][1][3],f4r[dz][1][3]); } vstore4(res, id.x+id.y*WIDTH, result); }