cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

leonbass
Journeyman III

Beginner - GPU Data size handling Connundrum and (bad?) return code.

Don't understand why these 2 GPUs have such different input data size capabilities.

I am trying to begin to learn OpenCL, and am also a newbie at GPU computing in general.
I adapted a convolution example program to use buffers and a GPU device  instead of a CPU device.  I am using a desktop with an NVIDIA GeForce GTX 275 and a laptop with an NVIDIA GeForce GT 130M.  Except for Compute Units, where the 275 has 30 and the GT 130M has 4, and Device Registers Per Block, where the 275 is 16K and the 130M is 8K, these units seem close.  In fact the GT 130M has 1GB memory and the 275 has only 896MB, the GT 130M is slightly faster in clock frequency.
However, the 275 will run a matrix of 1024 X 1024, while the 130M will run only 256 X 256 (I tried powers of 2 only).  This is a data size difference of 16 times!  
What  GPU properties can I look at to predict the amount of data a card can take in one shot?
When I increase the data beyond what the GPUs will do, both fail at a clFinish() following the kernels being enqueued.  The clFinish call returns a -5, CL_OUT_OF_RESOURCES which the Khronos spec says is not a possible clFinish() return code!  Any ideas on that?

Thanks in advance!

 

 

 

 

0 Likes
5 Replies
n0thing
Journeyman III

The maximum buffer size you can allocate in VRAM is device dependent. You can query this size by using the function clGetDeviceInfo using the flag : CL_DEVICE_MAX_MEM_ALLOC_SIZE. This will return the Max size of memory object allocation in bytes. On my 4850 512MB this is currently 128MB (1/4th of 512).

The register memory you were talking about is actually the physical local memory available on each SIMD unit on your GPU. To check the usage of this memory by your kernel use this function : clGetKernelWorkGroupInfo  and flag : CL_KERNEL_LOCAL_MEM_SIZE. If your physical memory is less that what is returned by this then your kernel will fail to execute.

0 Likes

Thanks, that's another aspect that I don't get. 
The 275 has a MAX_MEM_ALLOC_SIZE of 224MB, and will process a 1024 X 1024 version but not 2048 X 2048 which is only 4MB.  Even with input and output buffers, that's only 8mb.
Even worse, the 130M has a MAX_MEM_ALLOC_SIZE of 256MB, even bigger, but will not even do a 512 X 512, which is only 1/4 of a megabyte.

So both processors seem to allow a lot less than the physical memory and the MAX_MEM_ALLOC_SIZE would indicate, but the 130M is WAY worse.

0 Likes

Originally posted by: leonbass Thanks, that's another aspect that I don't get.  The 275 has a MAX_MEM_ALLOC_SIZE of 224MB, and will process a 1024 X 1024 version but not 2048 X 2048 which is only 4MB.  Even with input and output buffers, that's only 8mb. Even worse, the 130M has a MAX_MEM_ALLOC_SIZE of 256MB, even bigger, but will not even do a 512 X 512, which is only 1/4 of a megabyte.

 

So both processors seem to allow a lot less than the physical memory and the MAX_MEM_ALLOC_SIZE would indicate, but the 130M is WAY worse.

 

Leonbass,

  You will get better answers if you post in Nvidia OpenCL fourms

0 Likes
jackpug
Journeyman III

I think you may be hitting MAX_WORK_ITEM_SIZES or similar in your kernel execution?? Otherwise your numbers don't really make sense, especially about memory allocation. See,

clGetDeviceInfo

CL_DEVICE_MAX_WORK_*

0 Likes
Marix
Adept II

Actually the 1.1 spec pretty specifically states CL_OUT_OF_RESOURCES as a possible return value. Might happen e.g. if a kernel is the first call in a work queue that uses a memory object.

0 Likes