I have AMD Radeon HD 6970 GPU. It's have 2GB GDDR5 Memory. When i am trying to pass array of 200000000 (i.e 200 million) integer array to OpenCL Kernel it gives me wrong result.
If we count theoretically
200000000 X 4 bye (sizeof integer) = 800000000 bytes
800000000 / 1024 = 781250 KB
781250 / 1024 = 762.939453125 MB
It's less than GDDR 2GB memory
On other side, if i pass upto 64000000 elements array it gives correct result. why this happen?
Can any buddy help me.
Are you facing memory allocation errors? or Kernel correctness issues?
Which platform are you in ? (windows ? Linux?)
32 or 64-bit?
Neither facing memory allocation nor Kernel correctness issues. I am facing latency in memory transfer to GPU i am sending approx 8 GB data to GPU it's to much slow as compare to CPU.
I am using linux platform.
Even if you consider we are utilizing PCIe full bandwidth -- which is 8GB/s -- it will take minimum 1 second to transfer the data to GPU (Assuming pinned memory).
You cannot pin 8GB of memory - The OS will not allow it. So, the runtime will pin them chunk by chunk and start the DMA. So, there will be some latencies preventing us from reaching 8GB/s. It will take more than a second. May be, 1.25 seconds...
What is the time that you are seeing to do the transfer?
My program required approx 2 to 3 sec to transfer data and performing operation on it.
How can I use DMA? Can you give me any sample which transfer data in chunks?
For transferring data and running kernel on 8GB of data, 2-3 seconds may be reasonable. Anyways check http://devgurus.amd.com/message/1296694#1296694
In case you are still not satisfied, please post a small repro case.