I am trying to begin to learn OpenCL, and am also a newbie at GPU computing in general.
I adapted a convolution example program to use buffers and a GPU device instead of a CPU device. I am using a desktop with an NVIDIA GeForce GTX 275 and a laptop with an NVIDIA GeForce GT 130M. Except for Compute Units, where the 275 has 30 and the GT 130M has 4, and Device Registers Per Block, where the 275 is 16K and the 130M is 8K, these units seem close. In fact the GT 130M has 1GB memory and the 275 has only 896MB, the GT 130M is slightly faster in clock frequency.
However, the 275 will run a matrix of 1024 X 1024, while the 130M will run only 256 X 256 (I tried powers of 2 only). This is a data size difference of 16 times!
What GPU properties can I look at to predict the amount of data a card can take in one shot?
When I increase the data beyond what the GPUs will do, both fail at a clFinish() following the kernels being enqueued. The clFinish call returns a -5, CL_OUT_OF_RESOURCES which the Khronos spec says is not a possible clFinish() return code! Any ideas on that?
Thanks in advance!