cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

boxerab
Challenger

GCN: local memory barrier and work group size

I have a kernel with work group size equal to half wave front (64), running on GCN arch.

Can I dispense with local memory barriers for this kernel?

I realize that this may not work for future micro-archs, but for GCN arch up to and including Fury,

is this advisable to remove barriers?

Thanks!

0 Likes
1 Solution
matszpk
Adept III

By default, OpenCL C compiler (from AMD) automatically removes all barriers when you set reqd_work_group_size<=64.

Just prepend your kernel definition by __attribute__((reqd_work_group_size(32,1,1))), like that:

__kernel __attribute__((reqd_work_group_size(32,1,1))) vod myKernel(....)

{ }

View solution in original post

0 Likes
6 Replies
matszpk
Adept III

By default, OpenCL C compiler (from AMD) automatically removes all barriers when you set reqd_work_group_size<=64.

Just prepend your kernel definition by __attribute__((reqd_work_group_size(32,1,1))), like that:

__kernel __attribute__((reqd_work_group_size(32,1,1))) vod myKernel(....)

{ }

0 Likes

Thanks. I am leaving the barriers in, for future reference. Nice to know the compiler will take care of removing them.

0 Likes

Here is a very interesting stack overflow post about this situation - written for CUDA, but should apply to OpenCL as well:

http://stackoverflow.com/questions/6666382/can-i-use-syncthreads-after-having-dropped-threads

0 Likes

I know this is an old thread, but can anyone (particularly from AMD) comment on whether this still applies to Polaris arch?

i.e. does compiler still remove all local memory barriers when local work group size is <= 64 ?

0 Likes

Yes, it is still valid even on Polaris.

Regards,

0 Likes

Cool. Thanks for confirming.

0 Likes