AnsweredAssumed Answered

Global locking in OpenCL kernel

Question asked by ginquo on Sep 20, 2013

Hello,

 

I'm trying to implement locking of global buffers so I can apply a blit operation to a tile (which may be blitted to by other work-groups at the same time).

 

My current implementation looks like this.

 

// kernel arguments:
// volatile global float4* color_buffer ... color framebuffer, organized in tiles
// volatile global float* depth_buffer .... pixel depths associated with framebuffer
// volatile global int* tile_locks ........ a lock value for each tile of the framebuffer (1 if unlocked)

// local/private vars:
// local float4 colors[8][8] ... color tile that is going to be blitted
// local float depths[8][8] .... depth values of the associated tile

// private vars:
// int2 l ........ local id of work item in range [0,7]x[0,7]
// int head ...... 1 if l == (0,0) otherwise 0
// int tile_id ... index of the tile that is blitted to
// int fb_id ..... index of the pixel of the tile that is written to


// blit tile
if (head) while (!atomic_xchg(&(tile_locks[tile_id]), 0));
barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);

if (depths[l.y][l.x] < depth_buffer[fb_id]) {
    depth_buffer[fb_id] = depths[l.y][l.x]; 
    color_buffer[fb_id] = colors[l.y][l.x];
}

barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
if (head) atomic_xchg(&(tile_locks[tile_id]), 1);

 

The idea is to have the "main" work item acquire a lock for the tile, while all the others are waiting for it, apply the blit operation and then have the main item unlock it again. All using atomic operations.

 

However, this does not seem to work. Individual work groups are writing over each other as if no locking at all takes place. Are there any obvious errors I'm making here? Is global locking possible in OpenCL?

 

I'm using a Radeon HD 7970 with the Catalyst 13.8 beta on Linux.

Outcomes