cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

How to low host memory requirements

Looks like all GPU memory allocated on host too

My app uses quite a lot GPU memory.
For now I see very high host memory allocation too (~340MB).
But many of GPU buffers used only by GPU and doesn't needed on host.
Is it possible to restrict host memory usage somehow (that is, to prevent GPU memory duplication in host memory) ?
I use SDK 2.2
0 Likes
6 Replies
HarryH
Journeyman III

cl_mem buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, &status);

no host pointer

0 Likes

Example of my buffer allocation:

gpu_fold_large_neg=clCreateBuffer(
context,
CL_MEM_READ_WRITE,
sizeof(cl_float)*(262144+(32768>>8)*(DATA_CHUNK_UNROLL-1)),
NULL,
&err);
if(err != CL_SUCCESS)fprintf(stderr,"Error: clCreateBuffer (gpu_fold_large_neg): %d\n",err);

Looks very same, but it doesn't help
0 Likes

Raistmer,

On linux this works for me. I allocate a buffer of 128 MB on the GPU and the process

size on the host is only about 40MB total. Are you on linux or windows?

0 Likes
Raistmer
Adept II

I'm on Windows
And I have 2 app parts that use different size memory blocks (big ones both) so I should allocate/deallocate 2 sets of buffers in loop.
0 Likes

I have not checked this on linux but on windows OpenCL always allocates one temporary buffer. Even if you use CL_DEVICE_TYPE_CPU and specify CL_USE_MEM_HOST_PTR in buffer creation, it will still allocate a temp buffer on host.

This creates problems in porting existing apps to OpenCL which require large buffers.

0 Likes
Raistmer
Adept II

That is, no way ?
IMO it's big flaw in implementation. Some option to prevent such duplications should exist.
BTW, now I compare app performance between HD4870 and GSO9600 and very disappointed in ATI GPU performance. App was written for ATI, float4 memory accesses and operations used everywhere where possible, but GSO9600 performing better (and sometimes much better) in most cases. For now HD4870 looks better only on one kind of workload where most CPU time involved (so, it maybe CPU differencies, not GPU).
0 Likes