cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

nirv_knox
Adept I

GPU_MAX_ALLOC_PERCENT not working for fglrx 15.20

My clinfo has been messing up with the details of my drivers for my GPU AMD Radeon HD 7550M/7600M Series. I was somehow able to fix the total Global Memory Size using GPU_MAX_HEAP_SIZE. Now the max buffer size or max memory allocation size is stuck at the minimum allotted to OpenCL, i.e. 1/4th of the Device memory. I tried setting this:

export GPU_FORCE_64BIT_PTR=1

But that didn't work. Then I tried by modifying this:

export GPU_MAX_ALLOC_PERCENT=100

That didn't work either. Is there any other undocumented or documented environment variable that I can change to increase the buffer memory size?

Here is the clinfo output:

raptor@raptor-535U4C:~$ clinfo

Number of platforms: 1

  Platform Profile: FULL_PROFILE

  Platform Version: OpenCL 2.0 AMD-APP (1729.3)

  Platform Name: AMD Accelerated Parallel Processing

  Platform Vendor: Advanced Micro Devices, Inc.

  Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

  Platform Name: AMD Accelerated Parallel Processing

Number of devices: 2

  Device Type: CL_DEVICE_TYPE_GPU

  Vendor ID: 1002h

  Board name: AMD Radeon HD 7500M/7600M Series

  Device Topology: PCI[ B#1, D#0, F#0 ]

  Max compute units: 6

  Max work items dimensions: 3

    Max work items[0]: 256

    Max work items[1]: 256

    Max work items[2]: 256

  Max work group size: 256

  Preferred vector width char: 16

  Preferred vector width short: 8

  Preferred vector width int: 4

  Preferred vector width long: 2

  Preferred vector width float: 4

  Preferred vector width double: 0

  Native vector width char: 16

  Native vector width short: 8

  Native vector width int: 4

  Native vector width long: 2

  Native vector width float: 4

  Native vector width double: 0

  Max clock frequency: 500Mhz

  Address bits: 32

  Max memory allocation: 254803968

  Image support: Yes

  Max number of images read arguments: 128

  Max number of images write arguments: 8

  Max image 2D width: 16384

  Max image 2D height: 16384

  Max image 3D width: 2048

  Max image 3D height: 2048

  Max image 3D depth: 2048

  Max samplers within kernel: 16

  Max size of kernel argument: 1024

  Alignment (bits) of base address: 2048

  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms: No

    Quiet NaNs: Yes

    Round to nearest even: Yes

    Round to zero: Yes

    Round to +ve and infinity: Yes

    IEEE754-2008 fused multiply-add: Yes

  Cache type: None

  Cache line size: 0

  Cache size: 0

  Global memory size: 1019215872

  Constant buffer size: 65536

  Max number of constant args: 8

  Local memory type: Scratchpad

  Local memory size: 32768

  Kernel Preferred work group size multiple: 64

  Error correction support: 0

  Unified memory for Host and Device: 0

  Profiling timer resolution: 1

  Device endianess: Little

  Available: Yes

  Compiler available: Yes

  Execution capabilities: 

    Execute OpenCL kernels: Yes

    Execute native function: No

  Queue properties: 

    Out-of-Order: No

    Profiling : Yes

  Platform ID: 0x00007f3c48b2d8f0

  Name: Turks

  Vendor: Advanced Micro Devices, Inc.

  Device OpenCL C version: OpenCL C 1.2

  Driver version: 1729.3

  Profile: FULL_PROFILE

  Version: OpenCL 1.2 AMD-APP (1729.3)

  Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event

  Device Type: CL_DEVICE_TYPE_CPU

  Vendor ID: 1002h

  Board name: 

  Max compute units: 4

  Max work items dimensions: 3

    Max work items[0]: 1024

    Max work items[1]: 1024

    Max work items[2]: 1024

  Max work group size: 1024

  Preferred vector width char: 16

  Preferred vector width short: 8

  Preferred vector width int: 4

  Preferred vector width long: 2

  Preferred vector width float: 8

  Preferred vector width double: 4

  Native vector width char: 16

  Native vector width short: 8

  Native vector width int: 4

  Native vector width long: 2

  Native vector width float: 8

  Native vector width double: 4

  Max clock frequency: 1100Mhz

  Address bits: 64

  Max memory allocation: 5339882086

  Image support: Yes

  Max number of images read arguments: 128

  Max number of images write arguments: 64

  Max image 2D width: 8192

  Max image 2D height: 8192

  Max image 3D width: 2048

  Max image 3D height: 2048

  Max image 3D depth: 2048

  Max samplers within kernel: 16

  Max size of kernel argument: 4096

  Alignment (bits) of base address: 1024

  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms: Yes

    Quiet NaNs: Yes

    Round to nearest even: Yes

    Round to zero: Yes

    Round to +ve and infinity: Yes

    IEEE754-2008 fused multiply-add: Yes

  Cache type: Read/Write

  Cache line size: 64

  Cache size: 16384

  Global memory size: 5620928512

  Constant buffer size: 65536

  Max number of constant args: 8

  Local memory type: Global

  Local memory size: 32768

  Kernel Preferred work group size multiple: 1

  Error correction support: 0

  Unified memory for Host and Device: 1

  Profiling timer resolution: 1

  Device endianess: Little

  Available: Yes

  Compiler available: Yes

  Execution capabilities: 

    Execute OpenCL kernels: Yes

    Execute native function: Yes

  Queue properties: 

    Out-of-Order: No

    Profiling : Yes

  Platform ID: 0x00007f3c48b2d8f0

  Name: AMD A8-4555M APU with Radeon(tm) HD Graphics

  Vendor: AuthenticAMD

  Device OpenCL C version: OpenCL C 1.2

  Driver version: 1729.3 (sse2,avx,fma4)

  Profile: FULL_PROFILE

  Version: OpenCL 1.2 AMD-APP (1729.3)

  Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

0 Likes
1 Solution
nirv_knox
Adept I

Found it! After spending a sleepless night trying to solve the problem, I found the solution at 5 a.m. in the morning.

Apparently, use the following command lists down the environment variables supported by the GPU. In my case the "Turks" a.k.a. Radeon HD 7550M/7600M doesn't support GPU_MAX_ALLOC_PERCENT.

strings /usr/lib/libamdocl64.so | grep GPU

The above command displays the following list:

raptor@raptor-535U4C:~$ strings /usr/lib/libamdocl64.so | grep GPU

DEBUG_GPU_FLAGS

GPU_MAX_COMMAND_QUEUES

GPU_MAX_WORKGROUP_SIZE

GPU_MAX_WORKGROUP_SIZE_2D_X

GPU_MAX_WORKGROUP_SIZE_2D_Y

GPU_MAX_WORKGROUP_SIZE_3D_X

GPU_MAX_WORKGROUP_SIZE_3D_Y

GPU_MAX_WORKGROUP_SIZE_3D_Z

GPU_DEVICE_NAME

GPU_DEVICE_ORDINAL

GPU_INITIAL_HEAP_SIZE

GPU_MAX_HEAP_SIZE

GPU_HEAP_GROWTH_INCREMENT

GPU_STAGING_BUFFER_SIZE

GPU_DUMP_BLIT_KERNELS

GPU_BLIT_ENGINE_TYPE

GPU_FLUSH_ON_EXECUTION

GPU_USE_SYNC_OBJECTS

GPU_OPEN_VIDEO

GPU_PRE_RA_SCHED

GPU_PINNED_XFER_SIZE

GPU_PINNED_MIN_XFER_SIZE

GPU_RESOURCE_CACHE_SIZE

GPU_ASYNC_MEM_COPY

GPU_FORCE_64BIT_PTR

GPU_FORCE_OCL20_32BIT

GPU_RAW_TIMESTAMP

GPU_PARTIAL_DISPATCH

GPU_NUM_MEM_DEPENDENCY

GPU_XFER_BUFFER_SIZE

GPU_IMAGE_DMA

GPU_SINGLE_ALLOC_PERCENT

GPU_NUM_COMPUTE_RINGS

GPU_WORKLOAD_SPLIT

GPU_USE_SINGLE_SCRATCH

GPU_TARGET_INFO_ARCH

GPU_SPLIT_LIB

GPU_STAGING_WRITE_PERSISTENT

GPU_HSAIL_ENABLE

GPU_ASSUME_ALIASES

GPU_PRINT_CHILD_KERNEL

GPU_DIRECT_SRD

GPU_USE_DEVICE_QUEUE

GPU_ENABLE_LARGE_ALLOCATION

GPU_IFH_MODE

GPU_FORCE_SINGLE_FP_DENORM

GPU_ENABLE_HW_DEBUG

Virtual GPU List Ops Lock

GPU heap lock

Virtual GPU execution lock

ADL2_Display_PowerXpressActiveGPU_Get

ADL2_Display_PowerXpressActiveGPU_Set

uki_firegl_QueryGPUMapInfo

GPGPU, hw cannot provide thread-id

GPGPU, hw cannot support double-fp or memory export

GPGPU, exceed the local data storage limit

SH_MEM_ADDRESS_MODE_GPUVM64

SH_MEM_ADDRESS_MODE_GPUVM32

Specify that UAVs per pointer should be used(HD5XXX and HD6XXX series GPU's only).

Generate 64-bit ELF binary for GPU (default: 32-bit)

Enable/disable float f/c ==> f * (1.0f/c) for GPU (default : on)

Disabling (-fno-inline) GPU inlining for testing

-D__GPU__=1

From the above, setting the variable GPU_SINGLE_ALLOC_PERCENT to the desired percentage, changed the value of Max memory allocation accordingly. After running clinfo, the new Max memory allocation changed to

Max memory allocation: 515375104
Global memory size: 1030750208

The Global memory size set via GPU_MAX_HEAP_SIZE=96, whereas the Max memory allocation is set through GPU_SINGLE_ALLOC_PERCENT=50, which sets the buffer size at 50% of the Global memory size.

Please note that the variables vary from hardware to hardware. I am using AMDAPPSDK-3.0.0-Beta along with fglrx-15.20. Once again, please check your hardware supported variables with:

:~$ strings /usr/lib/libamdocl64.so | grep GPU

For some devices, it may be GPU_MAX_ALLOC_PERCENT, whereas it could be something else for others. The variables shown are undocumented. Fool around at your own risk.

UPDATE:

Please note that the variables' functionality also depends upon the version of AMD APP SDK. While GPU_SINGLE_ALLOC_PERCENT worked in AMD APP SDK 3.0-Beta, it does not work for AMD APP SDK 3.0. However, GPU_MAX_ALLOC_PERCENT=100 works in case of version 3.0.

View solution in original post

1 Reply
nirv_knox
Adept I

Found it! After spending a sleepless night trying to solve the problem, I found the solution at 5 a.m. in the morning.

Apparently, use the following command lists down the environment variables supported by the GPU. In my case the "Turks" a.k.a. Radeon HD 7550M/7600M doesn't support GPU_MAX_ALLOC_PERCENT.

strings /usr/lib/libamdocl64.so | grep GPU

The above command displays the following list:

raptor@raptor-535U4C:~$ strings /usr/lib/libamdocl64.so | grep GPU

DEBUG_GPU_FLAGS

GPU_MAX_COMMAND_QUEUES

GPU_MAX_WORKGROUP_SIZE

GPU_MAX_WORKGROUP_SIZE_2D_X

GPU_MAX_WORKGROUP_SIZE_2D_Y

GPU_MAX_WORKGROUP_SIZE_3D_X

GPU_MAX_WORKGROUP_SIZE_3D_Y

GPU_MAX_WORKGROUP_SIZE_3D_Z

GPU_DEVICE_NAME

GPU_DEVICE_ORDINAL

GPU_INITIAL_HEAP_SIZE

GPU_MAX_HEAP_SIZE

GPU_HEAP_GROWTH_INCREMENT

GPU_STAGING_BUFFER_SIZE

GPU_DUMP_BLIT_KERNELS

GPU_BLIT_ENGINE_TYPE

GPU_FLUSH_ON_EXECUTION

GPU_USE_SYNC_OBJECTS

GPU_OPEN_VIDEO

GPU_PRE_RA_SCHED

GPU_PINNED_XFER_SIZE

GPU_PINNED_MIN_XFER_SIZE

GPU_RESOURCE_CACHE_SIZE

GPU_ASYNC_MEM_COPY

GPU_FORCE_64BIT_PTR

GPU_FORCE_OCL20_32BIT

GPU_RAW_TIMESTAMP

GPU_PARTIAL_DISPATCH

GPU_NUM_MEM_DEPENDENCY

GPU_XFER_BUFFER_SIZE

GPU_IMAGE_DMA

GPU_SINGLE_ALLOC_PERCENT

GPU_NUM_COMPUTE_RINGS

GPU_WORKLOAD_SPLIT

GPU_USE_SINGLE_SCRATCH

GPU_TARGET_INFO_ARCH

GPU_SPLIT_LIB

GPU_STAGING_WRITE_PERSISTENT

GPU_HSAIL_ENABLE

GPU_ASSUME_ALIASES

GPU_PRINT_CHILD_KERNEL

GPU_DIRECT_SRD

GPU_USE_DEVICE_QUEUE

GPU_ENABLE_LARGE_ALLOCATION

GPU_IFH_MODE

GPU_FORCE_SINGLE_FP_DENORM

GPU_ENABLE_HW_DEBUG

Virtual GPU List Ops Lock

GPU heap lock

Virtual GPU execution lock

ADL2_Display_PowerXpressActiveGPU_Get

ADL2_Display_PowerXpressActiveGPU_Set

uki_firegl_QueryGPUMapInfo

GPGPU, hw cannot provide thread-id

GPGPU, hw cannot support double-fp or memory export

GPGPU, exceed the local data storage limit

SH_MEM_ADDRESS_MODE_GPUVM64

SH_MEM_ADDRESS_MODE_GPUVM32

Specify that UAVs per pointer should be used(HD5XXX and HD6XXX series GPU's only).

Generate 64-bit ELF binary for GPU (default: 32-bit)

Enable/disable float f/c ==> f * (1.0f/c) for GPU (default : on)

Disabling (-fno-inline) GPU inlining for testing

-D__GPU__=1

From the above, setting the variable GPU_SINGLE_ALLOC_PERCENT to the desired percentage, changed the value of Max memory allocation accordingly. After running clinfo, the new Max memory allocation changed to

Max memory allocation: 515375104
Global memory size: 1030750208

The Global memory size set via GPU_MAX_HEAP_SIZE=96, whereas the Max memory allocation is set through GPU_SINGLE_ALLOC_PERCENT=50, which sets the buffer size at 50% of the Global memory size.

Please note that the variables vary from hardware to hardware. I am using AMDAPPSDK-3.0.0-Beta along with fglrx-15.20. Once again, please check your hardware supported variables with:

:~$ strings /usr/lib/libamdocl64.so | grep GPU

For some devices, it may be GPU_MAX_ALLOC_PERCENT, whereas it could be something else for others. The variables shown are undocumented. Fool around at your own risk.

UPDATE:

Please note that the variables' functionality also depends upon the version of AMD APP SDK. While GPU_SINGLE_ALLOC_PERCENT worked in AMD APP SDK 3.0-Beta, it does not work for AMD APP SDK 3.0. However, GPU_MAX_ALLOC_PERCENT=100 works in case of version 3.0.