cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

liwoog
Adept II

Using the 64KB LDS on the 7970

The 7970 is supposed to have 64KB of LDS, but the size of the local memory returned by the OpenCL driver is 32KB.

Is there a way to use the full 64KB?

0 Likes
5 Replies
dmeiser
Elite

If I understand the hardware specifications correctly you cannot use the 64K for a single work group. If you have several work groups execute on one compute unit you can use more than 32K (e.g. 2 work groups that each use 32K).

0 Likes

That's correct.  The maximum allocation size is still 32KB, but you can schedule multiple workgroups per CU to consume the full 64KB.

0 Likes

Thank you.

I was told though that currently only a single kernel may run at once (hence why clEnqueueBarrier is a no-op).

So how would one schedule multiple workgroups?

Sent from my iPhone

0 Likes

enqueuebarrier is a no-op because the queues are only in-order.  That in itself says nothing about whether multiple (distinct) kernels can execute concurrently (e.g. using more than one queue).

So how would one schedule multiple workgroups?

Did you really mean to ask such a silly question?  😉

I thought 'scheduling multiple concurrent workgroups' was the entire reason to exist of modern GPU design:  hiding memory latency via many concurrent threads.  i.e. all one needs to do is ... invoke a kernel with global size/local size > hardware processors.

I would think it's pretty obvious that doubling the LDS just reduces local memory requirements as a potential source of a limiting concurrency.  The same way doubling the register count would - even if each workgroup still had the same register use limit.

0 Likes

Agreed that was a silly question.

Though I would much rather have had access to the full 64KB than see it split.

0 Likes