Can i assume that this behaviour indicates that the buffer is small enough to have not the problem of the 2 links above?
---well, I think you can use the printf extension to confirm/double check.
Hope this answer will be helpful to you.
1 of 1 people found this helpful
In the past if I exceeded the buffer limits, I got an error from allocation. You should be able to see maximum object size from clinfo output. Why dont you try it with vload3 ? and see if you are getting same values or not? also you can perhaps try on CPU?
The vload3 gives the same situation. I also tried to access the data with the  operator, with the same result. I tried to get the CL_DEVICE_MAX_MEM_ALLOC_SIZE with the following code but it return only two binary zeros...
DumpDetailedCLError(clGetDeviceInfo(deviceId,CL_DEVICE_MAX_MEM_ALLOC_SIZE, 0,0,&size), false);
DumpDetailedCLError(clGetDeviceInfo(deviceId,CL_DEVICE_MAX_MEM_ALLOC_SIZE, 500, &deviceInfo,0), false);
The one pro says to the AMD 7970:
Currently OpenCL users are limited to 25% of device memory,
I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct.
(there is a 512MB limit per allocation call but you can allocate as much as you like)
I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously.
So i don't think that it is a problem with allocating, because i have only a fraction of this buffer sizes at the same hardware. It seems rather that it is a problem while reading in this address space. This appears with OpenCL buffer AND with a shared GL VBO.
So i now try to divide into chunks with 8mio points, but i don't know at which position exactly the readings are faulty and why it is so much earlier than the 512MB border.
Thanks for your response so far.
1 of 1 people found this helpful
correct code for quering is this.
clGetDeviceInfo(deviceId, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &max_alloc_size, 0);
Thanks for correcting the code snipped. It prints "536870912 byte" which ensures, that this is not the problem. It helps me to detect that the memory access was the problem.
The printf puts also the expected faulty values. But checking with enqueueWriteBuffer and after it enqueueReadBuffer, i have determined that the value are correct written and readed compared to the host source.
So i thought that the problem is not the faulty values inside the memory, but rather the method i used to read. So i determined that global IDs with values above 8mio are faulty, wich i used to access the memory.
Now i shrinked my global ID work space and do looping my kernel to do overall the same work. This is now the solution of my Problem.
Does any one know how big the maximum global ID can be?
CL_DEVICE_MAX_WORK_ITMES_SIZES prints [256;256;256], which leads to a maximum global id number of 16777216. This is the work size i can use without faults.
global work size takes size_t , it should be at least as large as an unsigned 32bit value means it should be able to take at least 4,294,967,295 as far as I understand? work item sizes show the maximum local size you can use in each dimension and not related to the global size I think?
size_t The unsigned integer type of the result of the sizeof operator. This is a 32-bit unsigned integer if
CL_DEVICE_ADDRESS_BITSdefined in clGetDeviceInfo is 32-bits and is a 64-bit unsigned integer if
It seems that also a hardware depended max number exists. You should see it also inside the codeXL profiler.
But now i run into another strange problem. My memory access seems only to be valid, when i use the half of the described global work size. Are there some dependencies which must be considered, while raising the global work size?
yurtsen is probably right. AFAIK there is no theoritical limit on the global size and the total possible numbers by a size_t data_type appears to be the practical limit.
IMHO, it will be helpful if you can share a small cut-down code snippet, using which we can reproduce the issue.
There are many samples like matrixmultiplication, where you can specify large matrix sizes which would result in large global work-item range.
But your case also involves a CL_GL interop, so it may be interesting to see.
Message was edited by: WorkItem 7