1 Reply Latest reply on Sep 10, 2015 10:24 AM by nirv_knox

    GPU_MAX_ALLOC_PERCENT not working for fglrx 15.20

    nirv_knox

      My clinfo has been messing up with the details of my drivers for my GPU AMD Radeon HD 7550M/7600M Series. I was somehow able to fix the total Global Memory Size using GPU_MAX_HEAP_SIZE. Now the max buffer size or max memory allocation size is stuck at the minimum allotted to OpenCL, i.e. 1/4th of the Device memory. I tried setting this:

      export GPU_FORCE_64BIT_PTR=1

      But that didn't work. Then I tried by modifying this:

      export GPU_MAX_ALLOC_PERCENT=100

      That didn't work either. Is there any other undocumented or documented environment variable that I can change to increase the buffer memory size?

      Here is the clinfo output:

      raptor@raptor-535U4C:~$ clinfo

      Number of platforms: 1

        Platform Profile: FULL_PROFILE

        Platform Version: OpenCL 2.0 AMD-APP (1729.3)

        Platform Name: AMD Accelerated Parallel Processing

        Platform Vendor: Advanced Micro Devices, Inc.

        Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

       

       

       

       

        Platform Name: AMD Accelerated Parallel Processing

      Number of devices: 2

        Device Type: CL_DEVICE_TYPE_GPU

        Vendor ID: 1002h

        Board name: AMD Radeon HD 7500M/7600M Series

        Device Topology: PCI[ B#1, D#0, F#0 ]

        Max compute units: 6

        Max work items dimensions: 3

          Max work items[0]: 256

          Max work items[1]: 256

          Max work items[2]: 256

        Max work group size: 256

        Preferred vector width char: 16

        Preferred vector width short: 8

        Preferred vector width int: 4

        Preferred vector width long: 2

        Preferred vector width float: 4

        Preferred vector width double: 0

        Native vector width char: 16

        Native vector width short: 8

        Native vector width int: 4

        Native vector width long: 2

        Native vector width float: 4

        Native vector width double: 0

        Max clock frequency: 500Mhz

        Address bits: 32

        Max memory allocation: 254803968

        Image support: Yes

        Max number of images read arguments: 128

        Max number of images write arguments: 8

        Max image 2D width: 16384

        Max image 2D height: 16384

        Max image 3D width: 2048

        Max image 3D height: 2048

        Max image 3D depth: 2048

        Max samplers within kernel: 16

        Max size of kernel argument: 1024

        Alignment (bits) of base address: 2048

        Minimum alignment (bytes) for any datatype: 128

        Single precision floating point capability

          Denorms: No

          Quiet NaNs: Yes

          Round to nearest even: Yes

          Round to zero: Yes

          Round to +ve and infinity: Yes

          IEEE754-2008 fused multiply-add: Yes

        Cache type: None

        Cache line size: 0

        Cache size: 0

        Global memory size: 1019215872

        Constant buffer size: 65536

        Max number of constant args: 8

        Local memory type: Scratchpad

        Local memory size: 32768

        Kernel Preferred work group size multiple: 64

        Error correction support: 0

        Unified memory for Host and Device: 0

        Profiling timer resolution: 1

        Device endianess: Little

        Available: Yes

        Compiler available: Yes

        Execution capabilities: 

          Execute OpenCL kernels: Yes

          Execute native function: No

        Queue properties: 

          Out-of-Order: No

          Profiling : Yes

        Platform ID: 0x00007f3c48b2d8f0

        Name: Turks

        Vendor: Advanced Micro Devices, Inc.

        Device OpenCL C version: OpenCL C 1.2

        Driver version: 1729.3

        Profile: FULL_PROFILE

        Version: OpenCL 1.2 AMD-APP (1729.3)

        Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event

       

       

       

       

       

       

        Device Type: CL_DEVICE_TYPE_CPU

        Vendor ID: 1002h

        Board name: 

        Max compute units: 4

        Max work items dimensions: 3

          Max work items[0]: 1024

          Max work items[1]: 1024

          Max work items[2]: 1024

        Max work group size: 1024

        Preferred vector width char: 16

        Preferred vector width short: 8

        Preferred vector width int: 4

        Preferred vector width long: 2

        Preferred vector width float: 8

        Preferred vector width double: 4

        Native vector width char: 16

        Native vector width short: 8

        Native vector width int: 4

        Native vector width long: 2

        Native vector width float: 8

        Native vector width double: 4

        Max clock frequency: 1100Mhz

        Address bits: 64

        Max memory allocation: 5339882086

        Image support: Yes

        Max number of images read arguments: 128

        Max number of images write arguments: 64

        Max image 2D width: 8192

        Max image 2D height: 8192

        Max image 3D width: 2048

        Max image 3D height: 2048

        Max image 3D depth: 2048

        Max samplers within kernel: 16

        Max size of kernel argument: 4096

        Alignment (bits) of base address: 1024

        Minimum alignment (bytes) for any datatype: 128

        Single precision floating point capability

          Denorms: Yes

          Quiet NaNs: Yes

          Round to nearest even: Yes

          Round to zero: Yes

          Round to +ve and infinity: Yes

          IEEE754-2008 fused multiply-add: Yes

        Cache type: Read/Write

        Cache line size: 64

        Cache size: 16384

        Global memory size: 5620928512

        Constant buffer size: 65536

        Max number of constant args: 8

        Local memory type: Global

        Local memory size: 32768

        Kernel Preferred work group size multiple: 1

        Error correction support: 0

        Unified memory for Host and Device: 1

        Profiling timer resolution: 1

        Device endianess: Little

        Available: Yes

        Compiler available: Yes

        Execution capabilities: 

          Execute OpenCL kernels: Yes

          Execute native function: Yes

        Queue properties: 

          Out-of-Order: No

          Profiling : Yes

        Platform ID: 0x00007f3c48b2d8f0

        Name: AMD A8-4555M APU with Radeon(tm) HD Graphics

        Vendor: AuthenticAMD

        Device OpenCL C version: OpenCL C 1.2

        Driver version: 1729.3 (sse2,avx,fma4)

        Profile: FULL_PROFILE

        Version: OpenCL 1.2 AMD-APP (1729.3)

        Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

        • Re: GPU_MAX_ALLOC_PERCENT not working for fglrx 15.20
          nirv_knox

          Found it! After spending a sleepless night trying to solve the problem, I found the solution at 5 a.m. in the morning.

          Apparently, use the following command lists down the environment variables supported by the GPU. In my case the "Turks" a.k.a. Radeon HD 7550M/7600M doesn't support GPU_MAX_ALLOC_PERCENT.

          strings /usr/lib/libamdocl64.so | grep GPU

          The above command displays the following list:

          raptor@raptor-535U4C:~$ strings /usr/lib/libamdocl64.so | grep GPU

          DEBUG_GPU_FLAGS

          GPU_MAX_COMMAND_QUEUES

          GPU_MAX_WORKGROUP_SIZE

          GPU_MAX_WORKGROUP_SIZE_2D_X

          GPU_MAX_WORKGROUP_SIZE_2D_Y

          GPU_MAX_WORKGROUP_SIZE_3D_X

          GPU_MAX_WORKGROUP_SIZE_3D_Y

          GPU_MAX_WORKGROUP_SIZE_3D_Z

          GPU_DEVICE_NAME

          GPU_DEVICE_ORDINAL

          GPU_INITIAL_HEAP_SIZE

          GPU_MAX_HEAP_SIZE

          GPU_HEAP_GROWTH_INCREMENT

          GPU_STAGING_BUFFER_SIZE

          GPU_DUMP_BLIT_KERNELS

          GPU_BLIT_ENGINE_TYPE

          GPU_FLUSH_ON_EXECUTION

          GPU_USE_SYNC_OBJECTS

          GPU_OPEN_VIDEO

          GPU_PRE_RA_SCHED

          GPU_PINNED_XFER_SIZE

          GPU_PINNED_MIN_XFER_SIZE

          GPU_RESOURCE_CACHE_SIZE

          GPU_ASYNC_MEM_COPY

          GPU_FORCE_64BIT_PTR

          GPU_FORCE_OCL20_32BIT

          GPU_RAW_TIMESTAMP

          GPU_PARTIAL_DISPATCH

          GPU_NUM_MEM_DEPENDENCY

          GPU_XFER_BUFFER_SIZE

          GPU_IMAGE_DMA

          GPU_SINGLE_ALLOC_PERCENT

          GPU_NUM_COMPUTE_RINGS

          GPU_WORKLOAD_SPLIT

          GPU_USE_SINGLE_SCRATCH

          GPU_TARGET_INFO_ARCH

          GPU_SPLIT_LIB

          GPU_STAGING_WRITE_PERSISTENT

          GPU_HSAIL_ENABLE

          GPU_ASSUME_ALIASES

          GPU_PRINT_CHILD_KERNEL

          GPU_DIRECT_SRD

          GPU_USE_DEVICE_QUEUE

          GPU_ENABLE_LARGE_ALLOCATION

          GPU_IFH_MODE

          GPU_FORCE_SINGLE_FP_DENORM

          GPU_ENABLE_HW_DEBUG

          Virtual GPU List Ops Lock

          GPU heap lock

          Virtual GPU execution lock

          ADL2_Display_PowerXpressActiveGPU_Get

          ADL2_Display_PowerXpressActiveGPU_Set

          uki_firegl_QueryGPUMapInfo

          GPGPU, hw cannot provide thread-id

          GPGPU, hw cannot support double-fp or memory export

          GPGPU, exceed the local data storage limit

          SH_MEM_ADDRESS_MODE_GPUVM64

          SH_MEM_ADDRESS_MODE_GPUVM32

          Specify that UAVs per pointer should be used(HD5XXX and HD6XXX series GPU's only).

          Generate 64-bit ELF binary for GPU (default: 32-bit)

          Enable/disable float f/c ==> f * (1.0f/c) for GPU (default : on)

          Disabling (-fno-inline) GPU inlining for testing

          -D__GPU__=1

          From the above, setting the variable GPU_SINGLE_ALLOC_PERCENT to the desired percentage, changed the value of Max memory allocation accordingly. After running clinfo, the new Max memory allocation changed to

          Max memory allocation: 515375104
          Global memory size: 1030750208

          The Global memory size set via GPU_MAX_HEAP_SIZE=96, whereas the Max memory allocation is set through GPU_SINGLE_ALLOC_PERCENT=50, which sets the buffer size at 50% of the Global memory size.

           

          Please note that the variables vary from hardware to hardware. I am using AMDAPPSDK-3.0.0-Beta along with fglrx-15.20. Once again, please check your hardware supported variables with:

          :~$ strings /usr/lib/libamdocl64.so | grep GPU

          For some devices, it may be GPU_MAX_ALLOC_PERCENT, whereas it could be something else for others. The variables shown are undocumented. Fool around at your own risk.

           

          UPDATE:

           

          Please note that the variables' functionality also depends upon the version of AMD APP SDK. While GPU_SINGLE_ALLOC_PERCENT worked in AMD APP SDK 3.0-Beta, it does not work for AMD APP SDK 3.0. However, GPU_MAX_ALLOC_PERCENT=100 works in case of version 3.0.

          2 of 2 people found this helpful