Archives Discussions

Fr4nz · ‎03-06-2010

First of all my specs: ATI 5770 and OpenCL 2.01 implementation.

My host code makes a lot of iterations. During each iteration, the host code creates various buffer objects and among these buffers there's particularly one, let's call it outputKeyIntersections, that can be very big in size, depending on various conditions. This buffer isn't written nor readed from host to global memory, it is only used by kernels to the purpose of writing and reading temporary data. After creating all the buffers and other OpenCL stuff, two kernels are executed.

Here's the problem: when the size required for this particular buffer tends to be very big among the iterations (let's say, for example, 200Mbytes on more than 2-3 iterations), at some precise iteration the program gives me an error code "-4" after trying to create outputKeyIntersections with clCreateObject, which corresponds to a "CL_MEM_OBJECT_ALLOCATION_FAILURE".

In theory this is impossibile, because the other buffers created don't occupy all the memory in the videocard (when all summed, they occupy at maximum 300-400M, but on 5770 there's 1GB available, so there shouldn't be any problem...).

I'm starting to think that buffers aren't released properly...maybe there's something not working properly when calling clReleaseMemObject? Maybe it doesn't release all the memory occupied by the buffers?

omkaranathan · ‎03-06-2010

You can get the maximum available memory for OpenCL by a device query. CLInfo sample will also tell you this. IIRC, its 256Mb for 5770. If you try to allocate more memory than that, you will get CL_MEM_OBJECT_ALLOCATION_FAILURE.

Fr4nz · ‎03-06-2010

Originally posted by: omkaranathan You can get the maximum available memory for OpenCL by a device query. CLInfo sample will also tell you this. IIRC, its 256Mb for 5770. If you try to allocate more memory than that, you will get CL_MEM_OBJECT_ALLOCATION_FAILURE.

Excuse me but the 256MB limit is for ONE buffer object, not for ALL buffer objects...isn't it?

I don't create buffer objects of size >256M in my host code nor I allocate more than 1GB per iteration for all buffer objects...

EDIT: well, actually it really seems that the maximum allocable limit is 256Mbytes for ALL buffer objects, as omkaranathan said, which is quite atrocious given the fact that I have 1 GB on my videocard...

nou · ‎03-06-2010

this limit is implied from that 4xxx card can have only one UAV buffer at the time. that buffer has 256MB limit. fortunately 5xxx card can have up to 8 UAV buffers per kernel so this limit should disappear for Evergreen.

but i said it many times. buffers should not be hard allocated on ALL card in the system as it is now. on MacOS you can allocate more than GLOBAL_MEM_SIZE.

when you have multiple GPU then you should be able split input data (sum of input data can exceed GLOBAL_MEM_SIZE) to chunks and compute result independently on individual GPU.

Fr4nz · ‎03-06-2010

Originally posted by: nou this limit is implied from that 4xxx card can have only one UAV buffer at the time. that buffer has 256MB limit. fortunately 5xxx card can have up to 8 UAV buffers per kernel so this limit should disappear for Evergreen.

but i said it many times. buffers should not be hard allocated on ALL card in the system as it is now. on MacOS you can allocate more than GLOBAL_MEM_SIZE.

when you have multiple GPU then you should be able split input data (sum of input data can exceed GLOBAL_MEM_SIZE) to chunks and compute result independently on individual GPU.

So, do you think this limit will disappear in the future, at least on 5xxx videocards? And what about Nvidia? Do they have this allocation limit regarding all object buffers?

omkaranathan · ‎03-06-2010

Fr4nz,

The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with the current implementation. They are there so that OpenCL plays nice with other applications that use the graphics card. This will be improved in future releases.

Fr4nz · ‎03-06-2010

Originally posted by: omkaranathan Fr4nz,

The default values for OpenCL were chosen to provide a large amount of memory to the OpenCL application in a reliable and high performance fashion with the current implementation. They are there so that OpenCL plays nice with other applications that use the graphics card. This will be improved in future releases.

Okay, thank you for the answer

Raistmer · ‎03-07-2010

Plaform Version: OpenCL 1.0 CUDA 3.0.1
Plaform Name: NVIDIA CUDA

Global memory size: 519766016
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 16384

So, perhaps one could use much more than 128 or 256MB for total size of allocated buffers.
Limit to single allocation still 128M though.
(Also, look at type of local memory for NV GPU, it's not "Global", it's true local memory)

nou · ‎03-07-2010

ok i can say that 5xxx card report Scratchpad for local memory too.

Raistmer: could you pls try allocate more than Global memory size? on MacOS you you can allocate beyond that limit normaly. for example allocate four 128MB buffer.

Fr4nz · ‎03-07-2010

Originally posted by: nou ok i can say that 5xxx card report Scratchpad for local memory too.

EDIT: Ok, it is not a bug...

nou · ‎03-07-2010

no it is nt bug. only CLInfo report local memory as Scratchpad. you can see it in source of CLInfo

Archives Discussions

Strange problem when creating and releasing large buffer a lot of times...