3 Replies Latest reply on Mar 26, 2012 3:18 AM by teopemuk

    OpenCL simple program crash


      I am trying to use openlc for numerical simulation(ode+pde) but the simplest kernel crashes.

      It works fine when the number of steps is low but the following code crashes on my 5750. (sometimes the gpu driver recovers, sometimes i get bluescreen or freez) I tried 6490M and it requires even lesser steps for crash.


      __kernel void ocl_test(__global const float *a, __global const float *b, __global float *c) {   

          int i = get_global_id(0);

          c[i] = 0.0;

           for (float j = 0; j < 10000000.0 * 5.0; j += 1.0) {

              c[i] += a[i] * j;



        • OpenCL simple program crash

          kernel must run less than 5 seconds on windows. otherwise GPU get reset by watchdog.

          1 of 1 people found this helpful
          • Re: OpenCL simple program crash

            One should almost always use integers for loop indices, it's one place you cannot afford any rounding errors.  floats are not necessarily associative, as your code demonstrates.


            e.g. this code never ends on a cpu:


            float n = 1;

                float j;

                for (j = 0; j < 10000000.0 * 5.0; j += 1.0) {

               if (j == n) {
               printf("n = %f\n", n);
               n = n * 10;




            n = 1.000000

            n = 10.000000

            n = 100.000000

            n = 1000.000000

            n = 10000.000000

            n = 100000.000000

            n = 1000000.000000

            n = 10000000.000000


            ^C (I had to kill it as it fell into an infinite loop)


            Given a SP mantissa is only 24-bits, it will break at 2^24, which is confirmed with this addition to the loop:



            if (roundf(j+1.0) == roundf(j)) {
               printf("j = %f\n", j);




            n = 10000000.000000

            j = 16777216.000000

            j = 16777216.000000

            ... forever


            But apart from the numerical fault in your algorithm, for nou's reason and for others, big loops are a bad idea on gpu's anyway.  For example the alu's don't clock very fast, so very long loops will take a long time to run, try to spread the work sideways instead.  If your GUI runs on the same card then it will freeze so long as your kernel is running too (this is why microsoft windows has a watchdog timer for it).