Sorry if this question is already ask somewhere and answer already provide, if so please point me to the correct place.
I have a machine with 4GB RAM memory and HD5870 with 1GB memory. When I use clGetDeviceInfo to get CL_DEVICE_GLOBAL_MEM_SIZE it only return 3GB RAM in my machine and only 256MB video memory from HD5870 which is actually much smaller than the original value.
At first I though memory used by OS will deducted but end up the value getting was not.
When I try to allocate 140MB memory from HD5870, it fail and said memory not enough. In the end, I was able to use less than 110MB than 1GB memory provided by the graphic card.
I wish someone can tell me what is actually happen, that will be gratefully to help me understanding more in OpenCL.
Thank you.
Thank you MicahVillmow for the answer.
I actually testing on Windows 7 64-bit machine and I double checked my machine and it fully detected and allocated all 4GB memory.
Regrading graphic card memory allocation, do you mean that it is possible that the OS already used up most of the memory and only left a bit for me? But OpenCL should show me the device got 1GB global memory since HD5870 has a dedicated video memory 1GB, but it only show 256MB only.
Thank you again.
i tested on linux and i get that CL_GLOBAL_MEM_SIZE is 3072MB. i have 4GB of RAM. so i tested alloc 4*1GB of cl_mem. 1GB per buffer because CL_DEVICE_MAX_MEM_ALLOC_SIZE is only 1GB. i must for this test turn on swap of 2GB and whole it get 4.5GB of memory. so i think when is device memory too small OpenCL it will move buffers to RAM and OS to swap. i tested it only on CPU.
so try alloc two or three 100MB buffers and check value of CL_DEVICE_MAX_MEM_ALLOC_SIZE
EDIT: i tested it now on ubuntu 9.04 x64 radeon 4870 with 1GB VRAM. global_mem_size is reported 256MB and alloc_size is too 256MB. so i try create three 228MB buffers. but i get error that there is not enough video memory.
so i hope that this limit is only for this beta and it will be removed. because utilize only 1/4 of total memory is little inefficient
I'm seeing the same thing in the v1.0 release of ATI Stream SDK 2.0. My 5870 card only shows 256MB of global memory, and the CPU against the x86 OpenCL shows 1GB, and against the x64 DLL shows 3GB. My computer has 12GB of DDR3 and 2x Intel Nehalem CPUs and an nVidia GTX-260 and Tesla C1060 (in addition to the Radeon 5870 - and there's one x16 slot left to test other GPUs).
Most interestingly, the x64 DLLs on Windows add cl_khr_in64_base_atomics and _extended_atomics but no fp64, even for the CPU. Also, none of the OpenCL Image features are supported via the Stream SDK, and Fused Multiply/Add is also not supported. This is as of driver version CAL 1.4.515.
Originally posted by: whiteshadow
Regrading graphic card memory allocation, do you mean that it is possible that the OS already used up most of the memory and only left a bit for me? But OpenCL should show me the device got 1GB global memory since HD5870 has a dedicated video memory 1GB, but it only show 256MB only.
This 256MB is quite significant value for CAL ( OpenCL is written on top of CAL ). This is exactly the maximum size of allocable image ( buffer ) - this size comes from 8128*8128*(4*sizeof(float)). The value 8128 is maximum image width/height for 4xxx/5xxx hardware.
The total memory limit of 256MB looks like next example of poor compiler design from ATI side ( many basic optimizations are missing which make 4xxx family totaly unusable for any serious computations with OpenCL- the 5xxx family also has a lot of other problems ).
It looks like the design team went easy way and used one buffer as whole memory. It's easy and inefficient solution to the problem of multiple input/outputs in CAL kernel. I have to admit that at some point when I was writing some CAL kernel I did wonder how they solved it - now I know .
So back to your question - the system didn't use graphics cards memory. Simply OpenCL compiler doesn't know how to use more.
8128*8128*4*sizeof(float) is 1GB not 256MB.
Originally posted by: nou 8128*8128*4*sizeof(float) is 1GB not 256MB.
My bad - should be 8128*8128*sizeof(CAL_FORMAT_INT_1) ( format_int_1 as memory in opencl is accessed using uavs ).
Originally posted by: nou 8128*8128*4*sizeof(float) is 1GB not 256MB.
Yes, and 256MB is simply way too small.
I've tried to play with the env variables but have had no luck. Should the env variables be in bytes, MBytes or percent?
Originally posted by: ryta1203 Should the env variables be in bytes, MBytes or percent?
It is in percentage. value should bein the range 0 to 100.
Originally posted by: genaganna Originally posted by: ryta1203 Should the env variables be in bytes, MBytes or percent?
It is in percentage. value should bein the range 0 to 100.
I have both set to 100.
For global memory I get all of it, 1GB.
For buffer size, I am still getting 256MB.
Originally posted by: ryta1203
I have both set to 100.
For global memory I get all of it, 1GB.
For buffer size, I am still getting 256MB.
Only GPU_MAX_HEAP_SIZE is supported officially.
AH! Ok, so 256MB is the max, which explains why you can't run 4k*4k in the Twister sample.
This still does not explain why the kernel is failing for Scholes.
Not that I use it but it seems my CLInfo is also showing 1GB for RAM, and I have 6GB here with Windows 7 x64. I even tried CPU_MAX_HEAP_SIZE=100
Originally posted by: fpaboim Not that I use it but it seems my CLInfo is also showing 1GB for RAM, and I have 6GB here with Windows 7 x64. I even tried CPU_MAX_HEAP_SIZE=100
CPU_MAX_HEAP_SIZE is not supported at all.