cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

drstrip
Journeyman III

Why 256 work_items on RV770?

The RV770 shows 10 "compute units", and 256 work_items per dimension with 3 dimensions. In Brook+ you are allowed 1024 vector elements per dimension. Neither of these numbers actually relates to the number of thread processors, so where do they come from?

0 Likes
6 Replies
genaganna
Journeyman III

Originally posted by: drstrip The RV770 shows 10 "compute units", and 256 work_items per dimension with 3 dimensions. In Brook+ you are allowed 1024 vector elements per dimension. Neither of these numbers actually relates to the number of thread processors, so where do they come from?


Each compute unit is able to execute one or more Work_groups concurrently based the resources used by each work_group.

0 Likes

but this doesn't answer the question of why the number is 256. 256 (per dimension) is not a multiple of the number of thread processors. Likewise Brook+ used 1024 per dimension (not quite the same notion, admittedly), which also is not a multiple of the thread processors. So, it seems these numbers are essentially arbitrary.

 

So, perhaps I should rephrase my question. Is the choice of 256 as the max work_items an arbitrary choice by the implementor?

0 Likes

but this doesn't answer the question of why the number is 256. 256 (per dimension) is not a multiple of the number of thread processors. Likewise Brook+ used 1024 per dimension (not quite the same notion, admittedly), which also is not a multiple of the thread processors. So, it seems these numbers are essentially arbitrary.

 

 

 

So, perhaps I should rephrase my question. Is the choice of 256 as the max work_items an arbitrary choice by the implementor?



One work-group is executed on a single compute unit (aka SIMD engine) that contains 16 processing elements (also called SP). One compute unit executes 64 threads (1 wavefont) over 4 cycles. Work-group is always devided into groups of wavefonts. That's why work-group size is multiple of wavefront size, not number of compute units.

0 Likes

This helps a lot, esp since I had somehow read right past the part of the spec that says all the work_items in a work_group execute on a single processor. Now the power of two size of work_items makes sense to me, since it decouples it from the number of SIMD engines.

A question of clarification -

For the RV770, max work_items for each dimension is 256. max work group size is also 256. If I understand this now, that means I can have assign 16x16x1 or 32x8x1, etc, to a work group. The max work group size is total number of items, not items per dimension. Right?

 

Anyway, thanks for clearing up most of this for me.

0 Likes

Correct, 256 work items total in a work group.  The total global size can be much larger, but it must be comprised of work groups no larger than 256 work items. 

 

For example, your total global size can be 8192x8192, comprised of 16x16 work groups.  In this case, you will have 262144 work group (512x512) with each workgroup having 256 work items (16x16).

0 Likes

256 is 4 wavefronts on the high end/midrange graphic cards, 8 wavefronts on some mid-range cards and 16 wavefronts on some low end cards. So they are not arbitrary. This number will be higher in future releases, but the problem with having a lot of wavefronts allowed is that resources disappear quickly. This limit will be different in future revisions and eventually will be equal to the max allowed for the device for very simple kernels.
0 Likes