Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- AMD Community
- Communities
- Developers
- Devgurus Archives
- Archives Discussions
- Why 256 work_items on RV770?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-02-2010
12:16 AM

Why 256 work_items on RV770?

6 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-02-2010
12:55 AM

Originally posted by:The RV770 shows 10 "compute units", and 256 work_items per dimension with 3 dimensions. In Brook+ you are allowed 1024 vector elements per dimension. Neither of these numbers actually relates to the number of thread processors, so where do they come from?drstrip

Each compute unit is able to execute one or more Work_groups concurrently based the resources used by each work_group.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-02-2010
10:39 PM

but this doesn't answer the question of why the number is 256. 256 (per dimension) is not a multiple of the number of thread processors. Likewise Brook+ used 1024 per dimension (not quite the same notion, admittedly), which also is not a multiple of the thread processors. So, it seems these numbers are essentially arbitrary.

So, perhaps I should rephrase my question. Is the choice of 256 as the max work_items an arbitrary choice by the implementor?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-03-2010
12:36 AM

So, perhaps I should rephrase my question. Is the choice of 256 as the max work_items an arbitrary choice by the implementor?

One work-group is executed on a single compute unit (aka SIMD engine) that contains 16 processing elements (also called SP). One compute unit executes 64 threads (1 wavefont) over 4 cycles. Work-group is always devided into groups of wavefonts. That's why work-group size is multiple of wavefront size, not number of compute units.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-04-2010
02:15 AM

This helps a lot, esp since I had somehow read right past the part of the spec that says all the work_items in a work_group execute on a single processor. Now the power of two size of work_items makes sense to me, since it decouples it from the number of SIMD engines.

A question of clarification -

For the RV770, max work_items for each dimension is 256. max work group size is also 256. If I understand this now, that means I can have assign 16x16x1 or 32x8x1, etc, to a work group. The max work group size is total number of items, not items per dimension. Right?

Anyway, thanks for clearing up most of this for me.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-04-2010
02:43 AM

Correct, 256 work items total in a work group. The total global size can be much larger, but it must be comprised of work groups no larger than 256 work items.

For example, your total global size can be 8192x8192, comprised of 16x16 work groups. In this case, you will have 262144 work group (512x512) with each workgroup having 256 work items (16x16).

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

03-02-2010
10:46 PM