cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

boxerab
Challenger

wave synchronous programming

Jump to solution

I have a kernel with work group size equal to 32.  Is it safe to remove all local memory barriers, since

32 is <= size of a wavefront?

The following thread seems to imply that it is NOT good to remove local memory barriers, because work items

may be merged:

How to query wavefront size from kernel?

Thanks!

0 Likes
1 Solution

Accepted Solutions
tzachi_cohen
Staff
Staff

Re: wave synchronous programming

Jump to solution

No, it is not safe to remove barriers as long as you have more than one work item per work group. In any case, make your work group size 64 multiple lest you are seriously under-utilizing the GPU.

View solution in original post

0 Likes
9 Replies
tzachi_cohen
Staff
Staff

Re: wave synchronous programming

Jump to solution

No, it is not safe to remove barriers as long as you have more than one work item per work group. In any case, make your work group size 64 multiple lest you are seriously under-utilizing the GPU.

View solution in original post

0 Likes
boxerab
Challenger

Re: wave synchronous programming

Jump to solution

Thanks, Tzachi. Can you address the issue of work item merging? Is this why it is not safe to remove barriers for work groups

with size less than wave front size?

Also, my work group size needs to be 32 due to the algorithm I am using.

0 Likes
mrrvlad
Adept I

Re: wave synchronous programming

Jump to solution

in what case would I need to use a barrier (instead of mem_fence) when workgroup size is <= 64, assuming current GCN AMD GPU?

0 Likes
boxerab
Challenger

Re: wave synchronous programming

Jump to solution

Yes, good question.  Can I get by with a memory fence instead of a barrier?

0 Likes
realhet
Miniboss

Re: wave synchronous programming

Jump to solution

Hi,

I just tried it out on GCN: When workgroupsize is 32, then you'll have a whole wavefront for each workgroup, so half of the wavefront will be disabled by the 64bit exec mask.

When the workgroup fits in a single wavefront the there is no need of local mem barrier.

Make sure you aren't using more than 16KB of local mem though. (To be able to utilize all 4 vector simds in the compute units)

0 Likes
boxerab
Challenger

Re: wave synchronous programming

Jump to solution

@realhet cool!  I only use about 1K of local memory.  So, you are saying that workgroup size of 32 covers a whole wavefront.

I was under the impression that wavefront size is 64 on GCN.

So, I guess that if I target GCN, I can remove all of my barriers.  Can someone from AMD confirm this ?

0 Likes
set
Adept I

Re: Re: wave synchronous programming

Jump to solution

You always use a barrier where it's needed by algorithm. But if you know that workgroup size is fixed and fits in one wavefront – you hint the compiler to optimize it away by attributing your kernel with __attribute__((reqd_work_group_size(size)))

jason
Adept III

Re: wave synchronous programming

Jump to solution

Based on what I've seen / used it is safe.  You may need to use mem_fence in some parts to make sure things are flushed when you need LDS consistency.  I came to notice this trick after both Bolt (AMD sponsored/owned C++ stl like library) uses it in both their radix sort - as did some other academics amd looks to have collaborated with ( Takahiro Harada's radix sort) - I think they used this in their general scans too.  You have to do some extra work to make sure you're fully used all compute units - I've only seen it give marginal gains in most uses - though maybe it shaved a  millisecond or so on a 3 ms operation on a 7970.  I've not used it on GCN and wasn't sure how well this "trick" would work there.

Btw I use macros to cover the difference:

clcommons/common.h at master · nevion/clcommons · GitHub

0 Likes
realhet
Miniboss

Re: wave synchronous programming

Jump to solution

Yes, it is exactly 64 on gcn. I remember that on very old evergreen cards it was 32. Also on recent nvidia it is 32.

0 Likes