5 Replies Latest reply on Mar 30, 2012 4:29 PM by liwoog

    Using the 64KB LDS on the 7970

    liwoog

      The 7970 is supposed to have 64KB of LDS, but the size of the local memory returned by the OpenCL driver is 32KB.

       

      Is there a way to use the full 64KB?

        • Re: Using the 64KB LDS on the 7970
          dmeiser

          If I understand the hardware specifications correctly you cannot use the 64K for a single work group. If you have several work groups execute on one compute unit you can use more than 32K (e.g. 2 work groups that each use 32K).

            • Re: Using the 64KB LDS on the 7970
              jeff_golds

              That's correct.  The maximum allocation size is still 32KB, but you can schedule multiple workgroups per CU to consume the full 64KB.

                • Re: Using the 64KB LDS on the 7970
                  liwoog

                  Thank you.

                   

                  I was told though that currently only a single kernel may run at once (hence why clEnqueueBarrier is a no-op).

                   

                  So how would one schedule multiple workgroups?

                   

                  Sent from my iPhone

                    • Re: Using the 64KB LDS on the 7970
                      notzed

                      enqueuebarrier is a no-op because the queues are only in-order.  That in itself says nothing about whether multiple (distinct) kernels can execute concurrently (e.g. using more than one queue).

                       

                       

                      So how would one schedule multiple workgroups?

                       

                      Did you really mean to ask such a silly question?  ;-)

                       

                      I thought 'scheduling multiple concurrent workgroups' was the entire reason to exist of modern GPU design:  hiding memory latency via many concurrent threads.  i.e. all one needs to do is ... invoke a kernel with global size/local size > hardware processors.

                       

                      I would think it's pretty obvious that doubling the LDS just reduces local memory requirements as a potential source of a limiting concurrency.  The same way doubling the register count would - even if each workgroup still had the same register use limit.