This website uses cookies. By clicking OK, you consent to the use of cookies. Click Here to learn more about how we use cookies.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- AMD Community
- Developers
- OpenCL
- Re: Optimal number of wave fronts for kernel

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

boxerab

Challenger

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-29-2017
07:31 AM

Optimal number of wave fronts for kernel

My application runs a series of 7 kernels, and most of the time is taken by the 7th kernel.

This kernel has 50% occupancy.

Card is RX 470, 4GB.

For this 7th kernel, there are two settings: the first gives my a total of 100 wavefronts,

while the second gives me a total of only 30 wavefronts.

Timing for the second setting is about 3X slower than for the first. VALU utilization is about the same

for both.

I am guessing that the time is slower for the second because 30 wavefronts is not enough to

hide memory latency. Is there a way of calculating the optimal number of total wavefronts for a kernel,

given the occupancy and the number of CUs ?

Thanks.

Reply

5 Replies

sp314

Adept II

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-03-2018
09:31 PM

Re: Optimal number of wave fronts for kernel

As a follow up question, and pardon me if I'm wrong, but doesn't 30 waves on a machine with 32 CUs (RX 470) mean that there's no memory latency hiding at all?

Say, the first 30 CUs pick up one wave each (two CUs are idling), and one SIMD unit per CU is working on the wave it picked up, processing 4 x 16-wide things in 4 cycles. When it gets stuck on a memory access, what is there to switch to in order to hide the latency? (Similar logic applies in case of 4 SIMDs running 4 waves at once on one CU, I think.)

This is a follow up question.

Reply

boxerab

Challenger

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-03-2018
09:51 PM

Re: Optimal number of wave fronts for kernel

Thanks. Yes, that makes sense. This would explain why performance is so poor with only 30 wavefronts.

Given that each CU can run at most 10 wavefronts, and occupancy is .37, I guess the optimal number of wavefronts

is at least 32 * 3.7 ~= 120 wavefronts.

The situation is more complex because of the 6 other kernels that could also be running on a CU.

Reply

bomby

Adept I

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-11-2018
06:34 AM

Re: Optimal number of wave fronts for kernel

Reply

boxerab

Challenger

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-11-2018
07:21 AM

Re: Optimal number of wave fronts for kernel

Reply

bomby

Adept I

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-11-2018
07:30 AM

Re: Optimal number of wave fronts for kernel

Reply