cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

qld_sean
Journeyman III

Calc's Inconsistent with Large Datasets (using Local Memory)

Hi,

First my apologies this is probably a classic newbie mistake, I thought to help me visualize what is going on between the local/compute unit and global I'd do something very simple. My local workgroup size is 256 I'm using a R9 390 card I'm explicitly stating the global worksize (array size +1) in the host code. The interesting thing which occurs here is once my arrays get bigger than 5-10k floating point elements I start to get erratic behaviour often in this range the second run works while the first fails, when I go bigger to 100,000 its almost random in the response. I truly believe its the kernel and not the host code as I can append one line of code to overwrite one value in the output array and its always consistent with what I've overwritten it with irrespective to the input array sizes.

I know the code is rubbish and as you can see I was confused/desperate with the local and global locks, code was more made to help me get my head around openCL and the GPU but this issue is beyond me.

Thanks

Sean

__kernel void dot_product(__global float* a_vec, __global float* b_vec, __global float* output, __local float* partial_dot)

{

   int Offset = 256;

   int gid = get_global_id(0);

   int globalSize = get_global_size(0);

   int lid = get_local_id(0);

   int localSize = get_local_size(0);

   //local float* partial_dot;

   /* Place product of global values into local memory */

  partial_dot[lid] = a_vec[gid];

  partial_dot[lid + Offset] = b_vec[gid];

  barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);

/* Repeatedly multiply values in local memory */

  output[gid] = partial_dot[lid]*partial_dot[lid + Offset];

barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);

  /* Transfer final result to global memory, only want ONE thread to sum*/

   if(gid == 0) {

  for (int Count = 1;Count<=globalSize;Count++)

  output[0] += output[Count];

  } 

//output[0] = globalSize;

}

0 Likes
2 Replies
bsp2020
Challenger

Hi,

It seems like you are having a bit of trouble understaning GPGPU programming concept. I strongly recommend that you check out free online classes. I learned GPU programming by taking Coursera - Free Online Courses From Top Universities  and Introduction to Parallel Programming With CUDA - Udacity (in that order). I strongly recommend that you take those class. They both use CUDA. But the underlying concept is the same as OpenCL and will help you understand how to write OCL kernels. The coursera class is not currently open. But you should be able to audit the past class content even though you may not be able to do the programming assignment using their server.

Brian

0 Likes

Thanks Brain,

Great links, I was hoping to build up and do some basic things first but GPGPU programming concept struggling with, probably doesn't help that my core isn't programming but electrical engineering which is why I'm wanting to move away from the classic CPU methods (very slow for what I'm doing).

Cheers

Sean

0 Likes