cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

prefix sum helper function causes lock up on Cypress, but works fine on CPU

So, I wrote this prefix sum helper function that computes a scan of a local array whose size equals the size of 1st dimension of a work group.

unsigned int _log2(unsigned int num)
{
  switch(num)
  {
    case 1:
      return 0;
    case 2:
      return 1;
    case 4:
      return 2;
    case 8:
      return 3;
    case 16:
      return 4;
    case 32:
      return 5;
    case 64:
      return 6;
    case 128:
      return 7;
    case 256:
      return 8;
    case 512:
      return 9;
    default:
      return 0;
  }
}

void scanSum(__local unsigned int* array)
{
  if(get_local_id(0) == 0)
  {
    for(unsigned int i = 1; i < get_local_size(0); i++)
    {  
      array += array[i - 1];
    }  
  }

  barrier(CLK_LOCAL_MEM_FENCE);
#if 0
  //Don't know why but this code crashes Radeon cards

  unsigned int logThreads = _log2(get_local_size(0));

  for(unsigned int i = 0; i < logThreads; i++)
  {
    unsigned int newVal = array[get_local_id(0)];

    //If threadNum > 2^i
    if(get_local_id(0) >= (1 << i))
    {
      newVal += array[get_local_id(0) - (1 << i)];
    }
    barrier(CLK_LOCAL_MEM_FENCE);

    array[get_local_id(0)] = newVal;
   
    barrier(CLK_LOCAL_MEM_FENCE);
  }
#endif
}

If I use the logrithmic parallel version of the scan (commented out), it runs fine on the CPU but crashes on my Radeon 5870. The linear single threaded version works fine on both CPU and GPU. Since it works on the CPU, but not on the GPU, I'm curious how I can debug this code. Any thoughts?

0 Likes
1 Reply

rick.weber, this doesn't crash with our upcoming release. Also, for log2, it would probably be better to pass it in as a kernel argument since the local size is known at runtime and you wouldn't have the cost of flow control in your kernel.
0 Likes