While my code was running well in production using GPU_MAX_ALLOC_PERCENT at up to 100% with the 12.4 drivers, it fails (CL_OUT_OF_RESOURCES) with the 13.1 drivers (I allocate up to 90% of memory from the code). I tried changing 100% to 80% to no avail.
Only being able to use 2GB of the 3GB on the card would render it useless for my next project. I need every bit of memory I can use.
Is there a workaround?
Machine:
4x HD 7970
Catalyst 13.1 driver on CentOS 6.3
Operating System Version (name), Linux version 2.6.32-279.19.1.el6.centos.plus.x86_64 (mockbuild@c6b7.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Wed Dec 19 06:20:23 UTC 2012
Operating System Version (number), 2.6.32
Number Of Processors, 32
System Type, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Total Physical Memory, 64392 MB
Available Physical Memory, 62184 MB
Total Virtual Memory, 33554431 MB
Available Virtual Memory, 33519322 MB
Total Page Files, 8191 MB
Available Page Files, 8191 MB
Platform ID, 1, 1, 1, 1, 1
Device Type, GPU, GPU, GPU, GPU, CPU
Device Name, Tahiti, Tahiti, Tahiti, Tahiti, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Vendor, Advanced Micro Devices, Inc., Advanced Micro Devices, Inc., Advanced Micro Devices, Inc., Advanced Micro Devices, Inc., GenuineIntel
Command Queue Properties, Queue profiling, Queue profiling, Queue profiling, Queue profiling, Queue profiling
Is Available, Yes, Yes, Yes, Yes, Yes
Is Compiler Available, Yes, Yes, Yes, Yes, Yes
Is Little Endian, Yes, Yes, Yes, Yes, Yes
Error Correction Support, No, No, No, No, No
Execution Capabilities, Kernel Execution, Kernel Execution, Kernel Execution, Kernel Execution, Kernel Execution, Native Kernel Execution
Global Memory Cache Size, 16 KB, 16 KB, 16 KB, 16 KB, 32 KB
Memory Cache Type, Read Write, Read Write, Read Write, Read Write, Read Write
Global Memory Cache Line Size, 64 bytes, 64 bytes, 64 bytes, 64 bytes, 64 bytes
Global Memory Size, 2,048 MB, 2,048 MB, 2,048 MB, 2,048 MB, 64,393 MB
Host Unified Memory, No, No, No, No, Yes
Are Images Supported, Yes, Yes, Yes, Yes, Yes
Max Image 2D Dimensions, (256w, 256h), (256w, 256h), (256w, 256h), (256w, 256h), (1024w, 1024h)
Max Image 3D Dimensions, (256w, 256h, 256d), (256w, 256h, 256d), (256w, 256h, 256d), (256w, 256h, 256d), (1024w, 1024h, 1024d)
Local Memory Size, 32 KB, 32 KB, 32 KB, 32 KB, 32 KB
Local Memory Type, Local, Local, Local, Local, Global
Max Clock Frequency, 1050, 1050, 1050, 1050, 1200
Max Compute Units, 32, 32, 32, 32, 32
Max Constant Arguments, 8, 8, 8, 8, 8
Max Constant Buffer Size, 64 KB, 64 KB, 64 KB, 64 KB, 64 KB
Max Memory Allocation Size, 512 MB, 512 MB, 512 MB, 512 MB, 16,099 MB
Max Parameter Size, 1,024 bytes, 1,024 bytes, 1,024 bytes, 1,024 bytes, 4 KB
Read Image Arguments, 128, 128, 128, 128, 128
Max Samplers, 16, 16, 16, 16, 16
Max Workgroup Size, 256, 256, 256, 256, 1024
Max Work Item Dimensions, 3, 3, 3, 3, 3
Max Work Item Sizes, (256,256,256), (256,256,256), (256,256,256), (256,256,256), (1024,1024,1024)
Max Write Image Arguments, 8, 8, 8, 8, 8
Memory Base Address Alignment, 2048, 2048, 2048, 2048, 1024
Minimal Data Type Alignment Size, 128 bytes, 128 bytes, 128 bytes, 128 bytes, 128 bytes
OpenCL C Version, OpenCL C 1.2 , OpenCL C 1.2 , OpenCL C 1.2 , OpenCL C 1.2 , OpenCL C 1.2
Native Char Vector Width, 4, 4, 4, 4, 16
Native Short Vector Width, 2, 2, 2, 2, 8
Native Int Vector Width, 1, 1, 1, 1, 4
Native Long Vector Width, 1, 1, 1, 1, 2
Native Float Vector Width, 1, 1, 1, 1, 8
Native Double Vector Width, 1, 1, 1, 1, 4
Native Half Vector Width, 1, 1, 1, 1, 4
Preferred Char Vector Width, 4, 4, 4, 4, 16
Preferred Short Vector Width, 2, 2, 2, 2, 8
Preferred Int Vector Width, 1, 1, 1, 1, 4
Preferred Long Vector Width, 1, 1, 1, 1, 2
Preferred Float Vector Width, 1, 1, 1, 1, 8
Preferred Double Vector Width, 1, 1, 1, 1, 4
Preferred Half Vector Width, 1, 1, 1, 1, 4
Profile, FULL_PROFILE, FULL_PROFILE, FULL_PROFILE, FULL_PROFILE, FULL_PROFILE
Profiling Timer Resolution, 1, 1, 1, 1, 1
Vendor ID, OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2)
the output posted above looks like some modification of clinfo output. Can you share the source, it may help others as clinfo is having a issue when some platforms are OpenCL 1.1 and some are OpenCL 1.2 compliant.
I will ask the runtime guys and let you know if there is a way to enable the full memory. Can you check once with 12.10 driver(and 13.2 beta)? Do you still get 2GB out of 3GB memory for your tahiti cards. Thanks for reporting it.
This was a copy/paste from CodeXL system's info.
Here is the clinfo output below
Number of platforms: | 1 | |||
Platform Profile: | FULL_PROFILE | |||
Platform Version: | OpenCL 1.2 AMD-APP (1113.2) | |||
Platform Name: | AMD Accelerated Parallel Processing | |||
Platform Vendor: | Advanced Micro Devices, Inc. | |||
Platform Extensions: | cl_khr_icd cl_amd_event_callback cl_amd_offline_devices |
Platform Name: | AMD Accelerated Parallel Processing | ||||
Number of devices: | 5 | ||||
Device Type: | CL_DEVICE_TYPE_GPU | ||||
Device ID: | 4098 | ||||
Board name: | AMD Radeon HD 7900 Series | ||||
Device Topology: | PCI[ B#2, D#0, F#0 ] | ||||
Max compute units: | 32 | ||||
Max work items dimensions: | 3 | ||||
Max work items[0]: | 256 | ||||
Max work items[1]: | 256 | ||||
Max work items[2]: | 256 | ||||
Max work group size: | 256 | ||||
Preferred vector width char: | 4 | ||||
Preferred vector width short: | 2 | ||||
Preferred vector width int: | 1 | ||||
Preferred vector width long: | 1 | ||||
Preferred vector width float: | 1 | ||||
Preferred vector width double: | 1 | ||||
Native vector width char: | 4 | ||||
Native vector width short: | 2 | ||||
Native vector width int: | 1 | ||||
Native vector width long: | 1 | ||||
Native vector width float: | 1 | ||||
Native vector width double: | 1 | ||||
Max clock frequency: | 1050Mhz | ||||
Address bits: | 32 | ||||
Max memory allocation: | 536870912 | ||||
Image support: | Yes | ||||
Max number of images read arguments: | 128 | ||||
Max number of images write arguments: | 8 | ||||
Max image 2D width: | 16384 | ||||
Max image 2D height: | 16384 | ||||
Max image 3D width: | 2048 | ||||
Max image 3D height: | 2048 | ||||
Max image 3D depth: | 2048 | ||||
Max samplers within kernel: | 16 | ||||
Max size of kernel argument: | 1024 | ||||
Alignment (bits) of base address: | 2048 | ||||
Minimum alignment (bytes) for any datatype: | 128 |
Single precision floating point capability
Denorms: | No | |||||
Quiet NaNs: | Yes | |||||
Round to nearest even: | Yes | |||||
Round to zero: | Yes | |||||
Round to +ve and infinity: | Yes | |||||
IEEE754-2008 fused multiply-add: | Yes | |||||
Cache type: | Read/Write | |||||
Cache line size: | 64 | |||||
Cache size: | 16384 | |||||
Global memory size: | 2147483648 | |||||
Constant buffer size: | 65536 | |||||
Max number of constant args: | 8 | |||||
Local memory type: | Scratchpad | |||||
Local memory size: | 32768 | |||||
Kernel Preferred work group size multiple: | 64 | |||||
Error correction support: | 0 | |||||
Unified memory for Host and Device: | 0 | |||||
Profiling timer resolution: | 1 | |||||
Device endianess: | Little | |||||
Available: | Yes | |||||
Compiler available: | Yes | |||||
Execution capabilities: | ||||||
Execute OpenCL kernels: | Yes | |||||
Execute native function: | No | |||||
Queue properties: | ||||||
Out-of-Order: | No | |||||
Profiling : | Yes | |||||
Platform ID: | 0x00007ffab08f64e0 | |||||
Name: | Tahiti | |||||
Vendor: | Advanced Micro Devices, Inc. | |||||
Device OpenCL C version: | OpenCL C 1.2 | |||||
Driver version: | 1113.2 (VM) | |||||
Profile: | FULL_PROFILE | |||||
Version: | OpenCL 1.2 AMD-APP (1113.2) | |||||
Extensions: | cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics |
Device Type: | CL_DEVICE_TYPE_GPU | ||||
Device ID: | 4098 | ||||
Board name: | AMD Radeon HD 7900 Series | ||||
Device Topology: | PCI[ B#3, D#0, F#0 ] | ||||
Max compute units: | 32 | ||||
Max work items dimensions: | 3 | ||||
Max work items[0]: | 256 | ||||
Max work items[1]: | 256 | ||||
Max work items[2]: | 256 | ||||
Max work group size: | 256 | ||||
Preferred vector width char: | 4 | ||||
Preferred vector width short: | 2 | ||||
Preferred vector width int: | 1 | ||||
Preferred vector width long: | 1 | ||||
Preferred vector width float: | 1 | ||||
Preferred vector width double: | 1 | ||||
Native vector width char: | 4 | ||||
Native vector width short: | 2 | ||||
Native vector width int: | 1 | ||||
Native vector width long: | 1 | ||||
Native vector width float: | 1 | ||||
Native vector width double: | 1 | ||||
Max clock frequency: | 1050Mhz | ||||
Address bits: | 32 | ||||
Max memory allocation: | 536870912 | ||||
Image support: | Yes | ||||
Max number of images read arguments: | 128 | ||||
Max number of images write arguments: | 8 | ||||
Max image 2D width: | 16384 | ||||
Max image 2D height: | 16384 | ||||
Max image 3D width: | 2048 | ||||
Max image 3D height: | 2048 | ||||
Max image 3D depth: | 2048 | ||||
Max samplers within kernel: | 16 | ||||
Max size of kernel argument: | 1024 | ||||
Alignment (bits) of base address: | 2048 | ||||
Minimum alignment (bytes) for any datatype: | 128 |
Single precision floating point capability
Denorms: | No | |||||
Quiet NaNs: | Yes | |||||
Round to nearest even: | Yes | |||||
Round to zero: | Yes | |||||
Round to +ve and infinity: | Yes | |||||
IEEE754-2008 fused multiply-add: | Yes | |||||
Cache type: | Read/Write | |||||
Cache line size: | 64 | |||||
Cache size: | 16384 | |||||
Global memory size: | 2147483648 | |||||
Constant buffer size: | 65536 | |||||
Max number of constant args: | 8 | |||||
Local memory type: | Scratchpad | |||||
Local memory size: | 32768 | |||||
Kernel Preferred work group size multiple: | 64 | |||||
Error correction support: | 0 | |||||
Unified memory for Host and Device: | 0 | |||||
Profiling timer resolution: | 1 | |||||
Device endianess: | Little | |||||
Available: | Yes | |||||
Compiler available: | Yes | |||||
Execution capabilities: | ||||||
Execute OpenCL kernels: | Yes | |||||
Execute native function: | No | |||||
Queue properties: | ||||||
Out-of-Order: | No | |||||
Profiling : | Yes | |||||
Platform ID: | 0x00007ffab08f64e0 | |||||
Name: | Tahiti | |||||
Vendor: | Advanced Micro Devices, Inc. | |||||
Device OpenCL C version: | OpenCL C 1.2 | |||||
Driver version: | 1113.2 (VM) | |||||
Profile: | FULL_PROFILE | |||||
Version: | OpenCL 1.2 AMD-APP (1113.2) | |||||
Extensions: | cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics |
Device Type: | CL_DEVICE_TYPE_GPU | ||||
Device ID: | 4098 | ||||
Board name: | AMD Radeon HD 7900 Series | ||||
Device Topology: | PCI[ B#-125, D#0, F#0 ] | ||||
Max compute units: | 32 | ||||
Max work items dimensions: | 3 | ||||
Max work items[0]: | 256 | ||||
Max work items[1]: | 256 | ||||
Max work items[2]: | 256 | ||||
Max work group size: | 256 | ||||
Preferred vector width char: | 4 | ||||
Preferred vector width short: | 2 | ||||
Preferred vector width int: | 1 | ||||
Preferred vector width long: | 1 | ||||
Preferred vector width float: | 1 | ||||
Preferred vector width double: | 1 | ||||
Native vector width char: | 4 | ||||
Native vector width short: | 2 | ||||
Native vector width int: | 1 | ||||
Native vector width long: | 1 | ||||
Native vector width float: | 1 | ||||
Native vector width double: | 1 | ||||
Max clock frequency: | 1050Mhz | ||||
Address bits: | 32 | ||||
Max memory allocation: | 536870912 | ||||
Image support: | Yes | ||||
Max number of images read arguments: | 128 | ||||
Max number of images write arguments: | 8 | ||||
Max image 2D width: | 16384 | ||||
Max image 2D height: | 16384 | ||||
Max image 3D width: | 2048 | ||||
Max image 3D height: | 2048 | ||||
Max image 3D depth: | 2048 | ||||
Max samplers within kernel: | 16 | ||||
Max size of kernel argument: | 1024 | ||||
Alignment (bits) of base address: | 2048 | ||||
Minimum alignment (bytes) for any datatype: | 128 |
Single precision floating point capability
Denorms: | No | |||||
Quiet NaNs: | Yes | |||||
Round to nearest even: | Yes | |||||
Round to zero: | Yes | |||||
Round to +ve and infinity: | Yes | |||||
IEEE754-2008 fused multiply-add: | Yes | |||||
Cache type: | Read/Write | |||||
Cache line size: | 64 | |||||
Cache size: | 16384 | |||||
Global memory size: | 2147483648 | |||||
Constant buffer size: | 65536 | |||||
Max number of constant args: | 8 | |||||
Local memory type: | Scratchpad | |||||
Local memory size: | 32768 | |||||
Kernel Preferred work group size multiple: | 64 | |||||
Error correction support: | 0 | |||||
Unified memory for Host and Device: | 0 | |||||
Profiling timer resolution: | 1 | |||||
Device endianess: | Little | |||||
Available: | Yes | |||||
Compiler available: | Yes | |||||
Execution capabilities: | ||||||
Execute OpenCL kernels: | Yes | |||||
Execute native function: | No | |||||
Queue properties: | ||||||
Out-of-Order: | No | |||||
Profiling : | Yes | |||||
Platform ID: | 0x00007ffab08f64e0 | |||||
Name: | Tahiti | |||||
Vendor: | Advanced Micro Devices, Inc. | |||||
Device OpenCL C version: | OpenCL C 1.2 | |||||
Driver version: | 1113.2 (VM) | |||||
Profile: | FULL_PROFILE | |||||
Version: | OpenCL 1.2 AMD-APP (1113.2) | |||||
Extensions: | cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics |
Device Type: | CL_DEVICE_TYPE_GPU | ||||
Device ID: | 4098 | ||||
Board name: | AMD Radeon HD 7900 Series | ||||
Device Topology: | PCI[ B#-124, D#0, F#0 ] | ||||
Max compute units: | 32 | ||||
Max work items dimensions: | 3 | ||||
Max work items[0]: | 256 | ||||
Max work items[1]: | 256 | ||||
Max work items[2]: | 256 | ||||
Max work group size: | 256 | ||||
Preferred vector width char: | 4 | ||||
Preferred vector width short: | 2 | ||||
Preferred vector width int: | 1 | ||||
Preferred vector width long: | 1 | ||||
Preferred vector width float: | 1 | ||||
Preferred vector width double: | 1 | ||||
Native vector width char: | 4 | ||||
Native vector width short: | 2 | ||||
Native vector width int: | 1 | ||||
Native vector width long: | 1 | ||||
Native vector width float: | 1 | ||||
Native vector width double: | 1 | ||||
Max clock frequency: | 1050Mhz | ||||
Address bits: | 32 | ||||
Max memory allocation: | 536870912 | ||||
Image support: | Yes | ||||
Max number of images read arguments: | 128 | ||||
Max number of images write arguments: | 8 | ||||
Max image 2D width: | 16384 | ||||
Max image 2D height: | 16384 | ||||
Max image 3D width: | 2048 | ||||
Max image 3D height: | 2048 | ||||
Max image 3D depth: | 2048 | ||||
Max samplers within kernel: | 16 | ||||
Max size of kernel argument: | 1024 | ||||
Alignment (bits) of base address: | 2048 | ||||
Minimum alignment (bytes) for any datatype: | 128 |
Single precision floating point capability
Denorms: | No | |||||
Quiet NaNs: | Yes | |||||
Round to nearest even: | Yes | |||||
Round to zero: | Yes | |||||
Round to +ve and infinity: | Yes | |||||
IEEE754-2008 fused multiply-add: | Yes | |||||
Cache type: | Read/Write | |||||
Cache line size: | 64 | |||||
Cache size: | 16384 | |||||
Global memory size: | 2147483648 | |||||
Constant buffer size: | 65536 | |||||
Max number of constant args: | 8 | |||||
Local memory type: | Scratchpad | |||||
Local memory size: | 32768 | |||||
Kernel Preferred work group size multiple: | 64 | |||||
Error correction support: | 0 | |||||
Unified memory for Host and Device: | 0 | |||||
Profiling timer resolution: | 1 | |||||
Device endianess: | Little | |||||
Available: | Yes | |||||
Compiler available: | Yes | |||||
Execution capabilities: | ||||||
Execute OpenCL kernels: | Yes | |||||
Execute native function: | No | |||||
Queue properties: | ||||||
Out-of-Order: | No | |||||
Profiling : | Yes | |||||
Platform ID: | 0x00007ffab08f64e0 | |||||
Name: | Tahiti | |||||
Vendor: | Advanced Micro Devices, Inc. | |||||
Device OpenCL C version: | OpenCL C 1.2 | |||||
Driver version: | 1113.2 (VM) | |||||
Profile: | FULL_PROFILE | |||||
Version: | OpenCL 1.2 AMD-APP (1113.2) | |||||
Extensions: | cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics |
Device Type: | CL_DEVICE_TYPE_CPU | ||||
Device ID: | 4098 | ||||
Board name: | |||||
Max compute units: | 32 | ||||
Max work items dimensions: | 3 | ||||
Max work items[0]: | 1024 | ||||
Max work items[1]: | 1024 | ||||
Max work items[2]: | 1024 | ||||
Max work group size: | 1024 | ||||
Preferred vector width char: | 16 | ||||
Preferred vector width short: | 8 | ||||
Preferred vector width int: | 4 | ||||
Preferred vector width long: | 2 | ||||
Preferred vector width float: | 8 | ||||
Preferred vector width double: | 4 | ||||
Native vector width char: | 16 | ||||
Native vector width short: | 8 | ||||
Native vector width int: | 4 | ||||
Native vector width long: | 2 | ||||
Native vector width float: | 8 | ||||
Native vector width double: | 4 | ||||
Max clock frequency: | 2601Mhz | ||||
Address bits: | 64 | ||||
Max memory allocation: | 16880146432 | ||||
Image support: | Yes | ||||
Max number of images read arguments: | 128 | ||||
Max number of images write arguments: | 8 | ||||
Max image 2D width: | 8192 | ||||
Max image 2D height: | 8192 | ||||
Max image 3D width: | 2048 | ||||
Max image 3D height: | 2048 | ||||
Max image 3D depth: | 2048 | ||||
Max samplers within kernel: | 16 | ||||
Max size of kernel argument: | 4096 | ||||
Alignment (bits) of base address: | 1024 | ||||
Minimum alignment (bytes) for any datatype: | 128 |
Single precision floating point capability
Denorms: | Yes | |||||
Quiet NaNs: | Yes | |||||
Round to nearest even: | Yes | |||||
Round to zero: | Yes | |||||
Round to +ve and infinity: | Yes | |||||
IEEE754-2008 fused multiply-add: | Yes | |||||
Cache type: | Read/Write | |||||
Cache line size: | 64 | |||||
Cache size: | 32768 | |||||
Global memory size: | 67520585728 | |||||
Constant buffer size: | 65536 | |||||
Max number of constant args: | 8 | |||||
Local memory type: | Global | |||||
Local memory size: | 32768 | |||||
Kernel Preferred work group size multiple: | 1 | |||||
Error correction support: | 0 | |||||
Unified memory for Host and Device: | 1 | |||||
Profiling timer resolution: | 1 | |||||
Device endianess: | Little | |||||
Available: | Yes | |||||
Compiler available: | Yes | |||||
Execution capabilities: | ||||||
Execute OpenCL kernels: | Yes | |||||
Execute native function: | Yes | |||||
Queue properties: | ||||||
Out-of-Order: | No | |||||
Profiling : | Yes | |||||
Platform ID: | 0x00007ffab08f64e0 | |||||
Name: | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | |||||
Vendor: | GenuineIntel | |||||
Device OpenCL C version: | OpenCL C 1.2 | |||||
Driver version: | 1113.2 (sse2,avx) | |||||
Profile: | FULL_PROFILE | |||||
Version: | OpenCL 1.2 AMD-APP (1113.2) | |||||
Extensions: | cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt |
With 12.8 and up, I can run with GPU_MAX_ALLOC_PERCENT set up to 45. Thankfully it also ups the global mem size to 3074424832, which is the important factor.
hi liwoog,
Can you explain how you are seeing 3074424832(2.86GB) out of the 3GB by setting GPU_MAX_ALLOC_PERCENT to 45?
Shouldn't setting it to 45 enable 45% of the GPU memory?
Simple..
if (CL_SUCCESS != (err = clGetDeviceInfo(devices[0], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(cl_long), &maxGlobalMem, NULL)))
returns 3074424832.
The total memory seems to be dependent on the largest allocatable block:
setenv GPU_MAX_ALLOC_PERCENT 25
maxMemAllocSize(733741056), globalMemSize(2934964224)
setenv GPU_MAX_ALLOC_PERCENT 26
maxMemAllocSize(763090698), globalMemSize(3052362792)
setenv GPU_MAX_ALLOC_PERCENT 27
maxMemAllocSize(792440340), globalMemSize(3073376256)
Hi liwoog,
That is interesting to learn.
The GPU_MAX_ALLOC_PERCENT changes the maximum buffer size. If you wants more GPU memory, you could try setting GPU_MAX_HEAP_SIZE to a value close to a 100 (say 95).
GPU_MAX_HEAP_SIZE on its own does not seem to impact the maximum amount of usable memory.
I also tried the flaf GPU_MAX_HEAP_SIZE, and it did not work for me on linux with 13.1 driver. As i understand, these features are only for experimental purpose, and AMD keeps the right to disable these flags in future.
Anyways I do understand your problem, and i have asked someone for a workaround. I will let you know if i hear from him
Hi,
I'm reviving this thread.
In this thread Large buffers, drallan described a workaround to allocate large memory. You may try and check whether it works for you or not.
Regards,
Use the following command and find out the list of environment variables supported by you GPU devices.
:~$ strings /usr/lib/libamdocl64.so | grep GPU
From the list of variables, you can set the percentages accordingly. You can also check if the variables GPU_MAX_HEAP_SIZE, GPU_MAX_ALLOC_PERCENT etc. are available or not. Also, you can check which variables can be tweaked for optimized usage of your hardware. E.g. if you have a card of global memory size of 1024 MB, then you might consider having two different buffers of sizes 512 MB each. Or you might consider have one buffer with full 1024 MB allotted. The choice is yours.