cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

JanS
Journeyman III

Max memory allocation doesnt increase, even with env variables

hi,

i need to use at least 500 megabytes of memory in my kernel. got a 4650 with 1024MB here and even with

export GPU_INITIAL_HEAP_SIZE=512
export GPU_MAX_ALLOC_SIZE=512
export GPU_MAX_HEAP_SIZE=512

CLInfo reports:

Max memory allocation:             268435456

Global memory size:                 1073741824

a test case shows that using opencl with the cpu device allocates ~500mb of memory, so i should work on the GPU as well. my program just exits with ERROR: clCreateBuffer(-61), regardless if i set all env variables to 700 or more megabytes.

any help would be appreciated.

PS: ubuntu 10.04 32bit, catalyst 10.04, stream sdk 2.1

0 Likes
7 Replies
davibu
Journeyman III

Nou wrote in another forum GPU_MAX_ALLOC_SIZE doesn't work anymore with the new SDK. It looks like the largest single allocable memory buffer is 256MB, no matter what setting you use.

 

0 Likes

this is a major drawback, especially for HPC usage. i need to use as much as possible GPU memory to reduce the amount of host transfers.

total data size is > 8TB, which is calculated completly on the GPU and transfered back into the host. don't got a clue anyway how i should do this in an efficient way.

regards

0 Likes

JanS,
A way to do this efficiently is to use a out of memory algorithm that only calculates strips at a time. Ideally what you want is to parallelize compute and data transfers. The slides here: http://developer.amd.com/gpu_a...on%20Illustration.ppt
show the benefit of pipelining transfer + execution.

Basically you want to tweak your program so that the amount of time it takes to transfer memory is similar to the amount of time it takes to compute that much memory. That way you get optimal performance assuming efficient data transfers and kernel execution time.
0 Likes

so when i enqueue copy to buffer an execution of kernel on another buffer it will happend paralel when i call clFinish()?

0 Likes

nou,
If you use the enqueue map/unmap operations then this is the path that we will be optimizing for this case. Also, you should make sure you use READ_ONLY or WRITE_ONLY on buffers that you create so that unnecessary copies are not made.
0 Likes
Raistmer
Adept II

I use 2 global buffers for few kernels to pipeline data modification - first kernel reads from first buffer and writes to second, then second kernel reads from second and writes to first and so on.
I should create both buffers as read/write, right?
What better optimization of GPU memory usage is possible in such case?
With SDK 2.01 I've seen very slow map/unmap operations so I switched back to read/write operations.
Was this greatly improved in SDK 2.1 ? What "officially" preferred way to do data transfers from/to host memory now?
0 Likes

With the current implementation,  optimizations are not taken care. At present what you can do is to be careful not to do redundant transfers.

0 Likes