Archives Discussions

joern_sierwald · ‎04-19-2015

I have a R9 290X with 8GB and I'm using Windows 10 Build 10041. I have installed no extra software, SDKs or drivers on this test computer.

The device claims to support allocations in one chunk of close to 4 GB, as printed out by clinfo:

Address bits:	64
Max memory allocation:	4244635648

However, I cannot allocate more than 2GB. I'm trying something simple such as:

eig_opencl_handle* handle = eig_opencl_init();

cl_mem mem_grid;

size_t grid_npoints = 520 * 1024 * 1024;

cl_int opencl_error;

float* volume;

volume = (float*) calloc(grid_npoints, sizeof(float));

OPENCL_CHECK_ALLOC(mem_grid, handle, CL_MEM_READ_WRITE, sizeof(float)*grid_npoints); // that's an clCreateBuffer

OPENCL_CHECK(clEnqueueWriteBuffer(handle->command_queue[0], mem_grid, CL_TRUE, 0, sizeof(float)*grid_npoints, volume, 0, NULL, NULL));

I get an CL_MEM_OBJECT_ALLOCATION_FAILURE.

Is there something I should enable to be able to allocate large memory objects? This is an 64 bit application. I can query the maximum alloc and get the same value as clinfo.

Some more info:

Platform ID:	00007FFDB202AD30
Name:	Hawaii
Vendor:	Advanced Micro Devices, Inc.
Device OpenCL C version:	OpenCL C 2.0
Driver version:	1756.4 (VM)
Profile:	FULL_PROFILE
Version:	OpenCL 2.0 AMD-APP (1756.4)

Best Regards, Jörn

dipak · ‎04-20-2015

Try to set environmental variables GPU_MAX_HEAP_SIZE and GPU_MAX_ALLOC_PERCENT to 100 and check.

Re: radeon 7970 3gb card only showing 2gb in 32bit linux

For allocation of larger memory (4GB or more), try to set GPU_FORCE_64BIT_PTR as 1.

Cannot make OpenCL runtime expose more than 3 GB of RAM

Regards,

joern_sierwald · ‎04-20-2015

setting GPU_MAX_HEAP_SIZE will make the memory reported by the device move down (if set to, say, 10). I can make it fail at 1GB, if I want to.

X:\hand>set GPU_MAX_HEAP_SIZE=100

X:\hand>opencl1

(II) OpenCL: Found device: 'Hawaii' max local size: 256 256 256 total: 256

local memory: 32 Kb global memory: 8192 Mb max global alloc: 4048 Mb

X:\hand>set GPU_MAX_HEAP_SIZE=50

X:\hand>opencl1

(II) OpenCL: Found device: 'Hawaii' max local size: 256 256 256 total: 256

local memory: 32 Kb global memory: 4096 Mb max global alloc: 4048 Mb

X:\hand>set GPU_MAX_HEAP_SIZE=25

X:\hand>opencl1

(II) OpenCL: Found device: 'Hawaii' max local size: 256 256 256 total: 256

local memory: 32 Kb global memory: 2048 Mb max global alloc: 2048 Mb

X:\hand>

GPU_MAX_ALLOC_PERCENT has no effect. The device always reports MIN(values set by global heap size, 4048 MiB).

GPU_FORCE_64BIT_PTR has no effect, but I think on an opencl 2.0 / WDDM 2.0 platform running all 64 bit that cannot be a problem.

Still, the device reports 4244635648 Bytes but fails to deliver more than 2^31.

Cheers, Jörn

dipak · ‎04-21-2015

Really surprising. As "max. memory allocation" size is more than what you're trying to allocate, I guess, there should not be any allocation problem. Please be sure there is enough memory available during the allocation. Another point, are you able to allocate larger amount of memory if you create multiple buffers with smaller size (say < 2GB)?

Right now, I don't have the exact ready made setup as yours. I'll try to ready one and check it. Meanwhile, you may check the same on some other setups, if possible.

Regards,

joern_sierwald · ‎04-21-2015

Regarding your question about allocating a larger amount of memory with smaller buffers.

eig_opencl_handle* handle = eig_opencl_init();

cl_mem mem_grid[100];

size_t grid_npoints = 256 * 1024 * 1024;

cl_int opencl_error;

float* volume;

volume = (float*) calloc(grid_npoints, sizeof(float));

for (int i = 0; i < 100; i++) {

printf("trying %d\n", i);

OPENCL_CHECK_ALLOC(mem_grid, handle, CL_MEM_READ_WRITE, sizeof(float)*grid_npoints);

OPENCL_CHECK(clEnqueueWriteBuffer(handle->command_queue[0], mem_grid, CL_TRUE, 0, sizeof(float)*grid_npoints, volume, 0, NULL, NULL));

printf("done %d\n", i);

}

This is a loop, allocating 1GB chunks.

Irritatingly, this runs until I get tired of the paging. On a machine with 32 GB of RAM and 8GB of VRAM, I pressed ctrl-C at 42 GB, that is 42 chunks of exactly 1 GB.

The windows task manager confirms that the machine runs into paging due to lack of physical memory.

I did not really expect this behaviour. I went to a Windows 8.1 system with a 2GB cayman card, and it allowed 2 chunks, then failed with CL_MEM_OBJECT_ALLOCATION_FAILURE.

Also irritating: The hawaii card allows an allocation of 2^31 bytes, exactly. As a programmer I would expect that the highest value is at 2^31-1, but it is not.

coordz · ‎06-02-2015

joern.sierwald wrote:
Also irritating: The hawaii card allows an allocation of 2^31 bytes, exactly. As a programmer I would expect that the highest value is at 2^31-1, but it is not.

Errr..... that's exactly what I would expect: a maximum allocation size of 2^31 bytes gives an addressable range of 0 to (2^31)-1 inclusive, aka the range of a signed 32 bit int.

As to the original question: why can't the driver allocate more than 2Gb in a single buffer? I will now do some speculation:

1) It used to be the case that a single UAV on AMD's arch could only view 2Gb of memory regardless of address/page table translation bitness. This meant even if the driver could allocate a chunk of memory >2Gb, the shader would never be able to see all of it. This means you can create multiple 2Gb buffers which the driver can then make resident for the shaders to see up to the limit of the memory on the graphics card. (After this I'm guessing driver could map a window of CPU memory across the PCIe bus to cope with the spill out of GPU memory. I digress.)

2) The excellent summary Re: Cannot make OpenCL runtime expose more than 3 GB of RAM describes how the GPU bitness in OpenCL may change or is fixed. I suspect that it's actually the page table bitness that is altered and not the underlying bitness of the ISA running the shaders, i.e. point (1) still holds and setting GPU_FORCE_64BIT_PTR will have no effect.

3) Older architectures/driver design would map all buffers to a single UAV. This was to pass OCL conformance. However, because of point 1, meant the total maximum allocation of memory could only be 2Gb. GPU_MAX_ALLOC_PERCENT removes this limitation but can break conformance, i.e. you code breaks. Things have moved on so this flag isn;t going to affect you anyway.

To summarize: I don't think you can have a single buffer bigger than 2Gb until AMD has 64 bit addressing within its shader cores and the driver uses them like that. Perhaps this is the case now with OCL 2.0 and SVM but I've not experimented with where that tops out memory allocation-wise.

Archives Discussions

Failure to allocate buffer bigger than 2 GB on Windows 10