My clinfo has been messing up with the details of my drivers for my GPU AMD Radeon HD 7550M/7600M Series. I was somehow able to fix the total Global Memory Size using GPU_MAX_HEAP_SIZE. Now the max buffer size or max memory allocation size is stuck at the minimum allotted to OpenCL, i.e. 1/4th of the Device memory. I tried setting this:
export GPU_FORCE_64BIT_PTR=1
But that didn't work. Then I tried by modifying this:
export GPU_MAX_ALLOC_PERCENT=100
That didn't work either. Is there any other undocumented or documented environment variable that I can change to increase the buffer memory size?
Here is the clinfo output:
raptor@raptor-535U4C:~$ clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (1729.3)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon HD 7500M/7600M Series
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 500Mhz
Address bits: 32
Max memory allocation: 254803968
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1019215872
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00007f3c48b2d8f0
Name: Turks
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 1729.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1729.3)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 1100Mhz
Address bits: 64
Max memory allocation: 5339882086
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 5620928512
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00007f3c48b2d8f0
Name: AMD A8-4555M APU with Radeon(tm) HD Graphics
Vendor: AuthenticAMD
Device OpenCL C version: OpenCL C 1.2
Driver version: 1729.3 (sse2,avx,fma4)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1729.3)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
Solved! Go to Solution.
Found it! After spending a sleepless night trying to solve the problem, I found the solution at 5 a.m. in the morning.
Apparently, use the following command lists down the environment variables supported by the GPU. In my case the "Turks" a.k.a. Radeon HD 7550M/7600M doesn't support GPU_MAX_ALLOC_PERCENT.
strings /usr/lib/libamdocl64.so | grep GPU
The above command displays the following list:
raptor@raptor-535U4C:~$ strings /usr/lib/libamdocl64.so | grep GPU
DEBUG_GPU_FLAGS
GPU_MAX_COMMAND_QUEUES
GPU_MAX_WORKGROUP_SIZE
GPU_MAX_WORKGROUP_SIZE_2D_X
GPU_MAX_WORKGROUP_SIZE_2D_Y
GPU_MAX_WORKGROUP_SIZE_3D_X
GPU_MAX_WORKGROUP_SIZE_3D_Y
GPU_MAX_WORKGROUP_SIZE_3D_Z
GPU_DEVICE_NAME
GPU_DEVICE_ORDINAL
GPU_INITIAL_HEAP_SIZE
GPU_MAX_HEAP_SIZE
GPU_HEAP_GROWTH_INCREMENT
GPU_STAGING_BUFFER_SIZE
GPU_DUMP_BLIT_KERNELS
GPU_BLIT_ENGINE_TYPE
GPU_FLUSH_ON_EXECUTION
GPU_USE_SYNC_OBJECTS
GPU_OPEN_VIDEO
GPU_PRE_RA_SCHED
GPU_PINNED_XFER_SIZE
GPU_PINNED_MIN_XFER_SIZE
GPU_RESOURCE_CACHE_SIZE
GPU_ASYNC_MEM_COPY
GPU_FORCE_64BIT_PTR
GPU_FORCE_OCL20_32BIT
GPU_RAW_TIMESTAMP
GPU_PARTIAL_DISPATCH
GPU_NUM_MEM_DEPENDENCY
GPU_XFER_BUFFER_SIZE
GPU_IMAGE_DMA
GPU_SINGLE_ALLOC_PERCENT
GPU_NUM_COMPUTE_RINGS
GPU_WORKLOAD_SPLIT
GPU_USE_SINGLE_SCRATCH
GPU_TARGET_INFO_ARCH
GPU_SPLIT_LIB
GPU_STAGING_WRITE_PERSISTENT
GPU_HSAIL_ENABLE
GPU_ASSUME_ALIASES
GPU_PRINT_CHILD_KERNEL
GPU_DIRECT_SRD
GPU_USE_DEVICE_QUEUE
GPU_ENABLE_LARGE_ALLOCATION
GPU_IFH_MODE
GPU_FORCE_SINGLE_FP_DENORM
GPU_ENABLE_HW_DEBUG
Virtual GPU List Ops Lock
GPU heap lock
Virtual GPU execution lock
ADL2_Display_PowerXpressActiveGPU_Get
ADL2_Display_PowerXpressActiveGPU_Set
uki_firegl_QueryGPUMapInfo
GPGPU, hw cannot provide thread-id
GPGPU, hw cannot support double-fp or memory export
GPGPU, exceed the local data storage limit
SH_MEM_ADDRESS_MODE_GPUVM64
SH_MEM_ADDRESS_MODE_GPUVM32
Specify that UAVs per pointer should be used(HD5XXX and HD6XXX series GPU's only).
Generate 64-bit ELF binary for GPU (default: 32-bit)
Enable/disable float f/c ==> f * (1.0f/c) for GPU (default : on)
Disabling (-fno-inline) GPU inlining for testing
-D__GPU__=1
From the above, setting the variable GPU_SINGLE_ALLOC_PERCENT to the desired percentage, changed the value of Max memory allocation accordingly. After running clinfo, the new Max memory allocation changed to
Max memory allocation: 515375104
Global memory size: 1030750208
The Global memory size set via GPU_MAX_HEAP_SIZE=96, whereas the Max memory allocation is set through GPU_SINGLE_ALLOC_PERCENT=50, which sets the buffer size at 50% of the Global memory size.
Please note that the variables vary from hardware to hardware. I am using AMDAPPSDK-3.0.0-Beta along with fglrx-15.20. Once again, please check your hardware supported variables with:
:~$ strings /usr/lib/libamdocl64.so | grep GPU
For some devices, it may be GPU_MAX_ALLOC_PERCENT, whereas it could be something else for others. The variables shown are undocumented. Fool around at your own risk.
UPDATE:
Please note that the variables' functionality also depends upon the version of AMD APP SDK. While GPU_SINGLE_ALLOC_PERCENT worked in AMD APP SDK 3.0-Beta, it does not work for AMD APP SDK 3.0. However, GPU_MAX_ALLOC_PERCENT=100 works in case of version 3.0.
Found it! After spending a sleepless night trying to solve the problem, I found the solution at 5 a.m. in the morning.
Apparently, use the following command lists down the environment variables supported by the GPU. In my case the "Turks" a.k.a. Radeon HD 7550M/7600M doesn't support GPU_MAX_ALLOC_PERCENT.
strings /usr/lib/libamdocl64.so | grep GPU
The above command displays the following list:
raptor@raptor-535U4C:~$ strings /usr/lib/libamdocl64.so | grep GPU
DEBUG_GPU_FLAGS
GPU_MAX_COMMAND_QUEUES
GPU_MAX_WORKGROUP_SIZE
GPU_MAX_WORKGROUP_SIZE_2D_X
GPU_MAX_WORKGROUP_SIZE_2D_Y
GPU_MAX_WORKGROUP_SIZE_3D_X
GPU_MAX_WORKGROUP_SIZE_3D_Y
GPU_MAX_WORKGROUP_SIZE_3D_Z
GPU_DEVICE_NAME
GPU_DEVICE_ORDINAL
GPU_INITIAL_HEAP_SIZE
GPU_MAX_HEAP_SIZE
GPU_HEAP_GROWTH_INCREMENT
GPU_STAGING_BUFFER_SIZE
GPU_DUMP_BLIT_KERNELS
GPU_BLIT_ENGINE_TYPE
GPU_FLUSH_ON_EXECUTION
GPU_USE_SYNC_OBJECTS
GPU_OPEN_VIDEO
GPU_PRE_RA_SCHED
GPU_PINNED_XFER_SIZE
GPU_PINNED_MIN_XFER_SIZE
GPU_RESOURCE_CACHE_SIZE
GPU_ASYNC_MEM_COPY
GPU_FORCE_64BIT_PTR
GPU_FORCE_OCL20_32BIT
GPU_RAW_TIMESTAMP
GPU_PARTIAL_DISPATCH
GPU_NUM_MEM_DEPENDENCY
GPU_XFER_BUFFER_SIZE
GPU_IMAGE_DMA
GPU_SINGLE_ALLOC_PERCENT
GPU_NUM_COMPUTE_RINGS
GPU_WORKLOAD_SPLIT
GPU_USE_SINGLE_SCRATCH
GPU_TARGET_INFO_ARCH
GPU_SPLIT_LIB
GPU_STAGING_WRITE_PERSISTENT
GPU_HSAIL_ENABLE
GPU_ASSUME_ALIASES
GPU_PRINT_CHILD_KERNEL
GPU_DIRECT_SRD
GPU_USE_DEVICE_QUEUE
GPU_ENABLE_LARGE_ALLOCATION
GPU_IFH_MODE
GPU_FORCE_SINGLE_FP_DENORM
GPU_ENABLE_HW_DEBUG
Virtual GPU List Ops Lock
GPU heap lock
Virtual GPU execution lock
ADL2_Display_PowerXpressActiveGPU_Get
ADL2_Display_PowerXpressActiveGPU_Set
uki_firegl_QueryGPUMapInfo
GPGPU, hw cannot provide thread-id
GPGPU, hw cannot support double-fp or memory export
GPGPU, exceed the local data storage limit
SH_MEM_ADDRESS_MODE_GPUVM64
SH_MEM_ADDRESS_MODE_GPUVM32
Specify that UAVs per pointer should be used(HD5XXX and HD6XXX series GPU's only).
Generate 64-bit ELF binary for GPU (default: 32-bit)
Enable/disable float f/c ==> f * (1.0f/c) for GPU (default : on)
Disabling (-fno-inline) GPU inlining for testing
-D__GPU__=1
From the above, setting the variable GPU_SINGLE_ALLOC_PERCENT to the desired percentage, changed the value of Max memory allocation accordingly. After running clinfo, the new Max memory allocation changed to
Max memory allocation: 515375104
Global memory size: 1030750208
The Global memory size set via GPU_MAX_HEAP_SIZE=96, whereas the Max memory allocation is set through GPU_SINGLE_ALLOC_PERCENT=50, which sets the buffer size at 50% of the Global memory size.
Please note that the variables vary from hardware to hardware. I am using AMDAPPSDK-3.0.0-Beta along with fglrx-15.20. Once again, please check your hardware supported variables with:
:~$ strings /usr/lib/libamdocl64.so | grep GPU
For some devices, it may be GPU_MAX_ALLOC_PERCENT, whereas it could be something else for others. The variables shown are undocumented. Fool around at your own risk.
UPDATE:
Please note that the variables' functionality also depends upon the version of AMD APP SDK. While GPU_SINGLE_ALLOC_PERCENT worked in AMD APP SDK 3.0-Beta, it does not work for AMD APP SDK 3.0. However, GPU_MAX_ALLOC_PERCENT=100 works in case of version 3.0.