Archives Discussions

omion · ‎12-27-2009

I just started looking into OpenCL with my brand new HD5850, but I ran into a bit of a problem:

I have a context which includes both the CPU and GPU, but any buffers I create seem to be limited to the available GPU memory. My computer has 8GB RAM. OpenCL sees 1GB usable on the CPU and 256MB usable on the GPU. Once I try to allocate more than about 256MB of buffers, I get a message on stderr:

C:\openclmainline\runtime\src\device\gpu\gpudevice.cpp:754: guarantee(!"We don't have enough video memory!")

Then the whole program crashes. (That file doesn't exist, by the way - I assume the message is a placeholder in the source for the OpenCL driver.)

Shouldn't the buffers be allocated on the host and only need to be on the GPU when it is using them? I kind of figured the driver would do all the memory swapping dynamically and automatically. Is this a function of the OpenCL spec or just of AMD's current implementation? If the latter, is there a fix planned?

genaganna · ‎12-27-2009

Originally posted by: omion

Shouldn't the buffers be allocated on the host and only need to be on the GPU when it is using them? I kind of figured the driver would do all the memory swapping dynamically and automatically. Is this a function of the OpenCL spec or just of AMD's current implementation? If the latter, is there a fix planned?

Omion,

Could you please clarify what you are looking?

omion · ‎12-27-2009

There were 2 things wrong with the situation; as far as I can tell:

1. The memory looks like it's allocated on all devices in a context, which doesn't seem right. I would have expected the host memory to be used instead, then the devices cache it when a kernel needs it.

2. If there really is a problem on the device the function should return an error code, rather than simply crashing.

The second problem is definitely a bug. The first may just be me misunderstanding how the memory allocation works.

I suppose I want to know if issue 2 is being worked on, and if issue 1 is actually a problem or just me not knowing how things work.

n0thing · ‎12-27-2009

Originally posted by: omion Shouldn't the buffers be allocated on the host and only need to be on the GPU when it is using them?

The buffers are allocated on the device and your buffer size is limited by CL_DEVICE_MAX_MEM_ALLOC_SIZE which is 256MB in your case.

If your data set is larger then you need to divide it into multiple buffers.

omion · ‎12-27-2009

@n0thing:

That's actually what I'm doing. The problem is that it doesn't help for some reason (which is why I think there is something wrong).

For example, I should be able to allocate three 100MB buffers, but the third one always makes the program crash. I found out that I can actually make 266 buffers, each 1MB in size (1048576 bytes) before it dies. With 265 buffers neither the stderr message nor the crash occurs.

Also, attempting to allocate a buffer larger than CL_DEVICE_MAX_MEM_ALLOC_SIZE will simply result in the function failing with CL_INVALID_BUFFER_SIZE, which is much easier to deal with than a full program crash.

Some background on what I'm doing: I need to work on data sets that may be up to 1TB in size. So I split up the set into slices that are around the size of the available system memory. However, as n0thing noted, the devices can't handle that much data. So the working set is actually represented by a number of buffers, each smaller than the smallest CL_DEVICE_MAX_MEM_ALLOC_SIZE for all devices.

So I have a data set that is 40GB in size. The program will do 20 passes with 2GB of memory at a time, with the 2GB represented by ten 200MB buffers.

I suppose I have a question to the other users: has anybody else run into this problem? It happens EVERY time I use a context that includes the GPU and the total buffer memory usage exceeds about 260MB.

empty_knapsack · ‎12-27-2009

Originally posted by: omionI suppose I have a question to the other users: has anybody else run into this problem? It happens EVERY time I use a context that includes the GPU and the total buffer memory usage exceeds about 260MB.

Taking TemplateC example from SDK and increasing width from default 256 to 32*1024*1024 (thus, 128MB for input + 128MB for output) ends in the same way -- "C:\openclmainline\runtime\src\device\gpu\gpudevice.cpp:754: guarantee(!"We don't have enough video memory!")".

Obviously it's SDK problem, it shouldn't crash at all, it must report some OUT_OF_MEMORY error. Especially when we have 1Gb video RAM and allocating only 1/4 of it in small chunks.

nou · ‎12-27-2009

i can confirm that it crash when i want allocate more memory than 256MB. but with only CPU i can allocate 4GB without problem. i have 4GiB of RAM and OpenCL return CL_DEVICE_GLOBAL_MEM_SIZE is 3GiB and CL_DEVICE_MAX_MEM_ALLOC_SIZE is 1GiB.

i think that opencl driver should automaticly swap memory buffer to and from device as it needed.

genaganna · ‎12-28-2009

Thank you for reporting this issue. this is forwared to developers.

MicahVillmow · ‎12-28-2009

Omiom/Nou,
I think the key part of the spec that covers this is the return value of clCreateBuffer, section 5.2.1.
"CL_INVALID_BUFFER_SIZE if size is 0 or is greater than
CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in table 4.3 for all devices in
context."

The key wording is 'all devices'. So, the max amount of memory you can allocate for a context is the minimum reported by CL_DEVICE_MAX_MEM_ALLOC_SIZE of all devices associated with that context. In this case, the GPU is the bottleneck and limits the max allocation to 256MB.

omion · ‎12-28-2009

Originally posted by: MicahVillmow Omiom/Nou, I think the key part of the spec that covers this is the return value of clCreateBuffer, section 5.2.1. "CL_INVALID_BUFFER_SIZE if size is 0 or is greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in table 4.3 for all devices in context." The key wording is 'all devices'. So, the max amount of memory you can allocate for a context is the minimum reported by CL_DEVICE_MAX_MEM_ALLOC_SIZE of all devices associated with that context. In this case, the GPU is the bottleneck and limits the max allocation to 256MB.

Well, that's not quite what's happening (which is the problem). The spec says that each buffer needs to be smaller than the min value of all CL_DEVICE_MAX_MEM_ALLOC_SIZEs, but the problem occurs if the total number of bytes in all buffers exceeds this amount.

So, if I allocate a single 200MB buffer, the program is fine. If I then allocate another 200MB buffer (which should still be fine, according to the specs since the size requested is less than CL_DEVICE_MAX_MEM_ALLOC_SIZE) the program completely crashes. No error, just a crash.

MicahVillmow · ‎12-28-2009

Yes, I agree that the crashing is a problem and this has been reported. Also, can you point me to the part of the spec that you believe states that this should be valid? I can't seem to find anything.

omion · ‎12-28-2009

Originally posted by: MicahVillmowAlso, can you point me to the part of the spec that you believe states that this should be valid? I can't seem to find anything.

It was actually in the part of the specs that you quoted. It says that CL_INVALID_BUFFER_SIZE is returned if size is greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE for everything. size here refers to the third argument in creating a new buffer, not the total size for all buffers.

I'll give a pseudo-code example:

buf1 = clCreateBuffer(ctx, 0, 200*1024*1024, NULL, NULL);

buf2 = clCreateBuffer(ctx, 0, 200*1024*1024, NULL, NULL);

For both clCreateBuffers, size is less than 256MB, therefore neither allocation should return with CL_INVALID_BUFFER_SIZE.

Of course, after enough allocations the host's memory will run out, in which case CL_OUT_OF_HOST_MEMORY or CL_MEM_OBJECT_ALLOCATION_FAILURE would be returned (not really sure which one, though)

Archives Discussions

Device vs host memory with buffers