I have a kernel with work group size equal to half wave front (64), running on GCN arch.
Can I dispense with local memory barriers for this kernel?
I realize that this may not work for future micro-archs, but for GCN arch up to and including Fury,
is this advisable to remove barriers?
Thanks!
Solved! Go to Solution.
By default, OpenCL C compiler (from AMD) automatically removes all barriers when you set reqd_work_group_size<=64.
Just prepend your kernel definition by __attribute__((reqd_work_group_size(32,1,1))), like that:
__kernel __attribute__((reqd_work_group_size(32,1,1))) vod myKernel(....)
{ }
By default, OpenCL C compiler (from AMD) automatically removes all barriers when you set reqd_work_group_size<=64.
Just prepend your kernel definition by __attribute__((reqd_work_group_size(32,1,1))), like that:
__kernel __attribute__((reqd_work_group_size(32,1,1))) vod myKernel(....)
{ }
Thanks. I am leaving the barriers in, for future reference. Nice to know the compiler will take care of removing them.
Here is a very interesting stack overflow post about this situation - written for CUDA, but should apply to OpenCL as well:
http://stackoverflow.com/questions/6666382/can-i-use-syncthreads-after-having-dropped-threads
I know this is an old thread, but can anyone (particularly from AMD) comment on whether this still applies to Polaris arch?
i.e. does compiler still remove all local memory barriers when local work group size is <= 64 ?
Yes, it is still valid even on Polaris.
Regards,
Cool. Thanks for confirming.