Archives Discussions

nirv_knox · ‎09-06-2015

My clinfo has been messing up with the details of my drivers for my GPU AMD Radeon HD 7550M/7600M Series. I was somehow able to fix the total Global Memory Size using GPU_MAX_HEAP_SIZE. Now the max buffer size or max memory allocation size is stuck at the minimum allotted to OpenCL, i.e. 1/4th of the Device memory. I tried setting this:

export GPU_FORCE_64BIT_PTR=1

But that didn't work. Then I tried by modifying this:

export GPU_MAX_ALLOC_PERCENT=100

That didn't work either. Is there any other undocumented or documented environment variable that I can change to increase the buffer memory size?

Here is the clinfo output:

raptor@raptor-535U4C:~$ clinfo

Number of platforms: 1

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 2.0 AMD-APP (1729.3)

Platform Name: AMD Accelerated Parallel Processing

Platform Vendor: Advanced Micro Devices, Inc.

Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing

Number of devices: 2

Device Type: CL_DEVICE_TYPE_GPU

Vendor ID: 1002h

Board name: AMD Radeon HD 7500M/7600M Series

Device Topology: PCI[ B#1, D#0, F#0 ]

Max compute units: 6

Max work items dimensions: 3

Max work items[0]: 256

Max work items[1]: 256

Max work items[2]: 256

Max work group size: 256

Preferred vector width char: 16

Preferred vector width short: 8

Preferred vector width int: 4

Preferred vector width long: 2

Preferred vector width float: 4

Preferred vector width double: 0

Native vector width char: 16

Native vector width short: 8

Native vector width int: 4

Native vector width long: 2

Native vector width float: 4

Native vector width double: 0

Max clock frequency: 500Mhz

Address bits: 32

Max memory allocation: 254803968

Image support: Yes

Max number of images read arguments: 128

Max number of images write arguments: 8

Max image 2D width: 16384

Max image 2D height: 16384

Max image 3D width: 2048

Max image 3D height: 2048

Max image 3D depth: 2048

Max samplers within kernel: 16

Max size of kernel argument: 1024

Alignment (bits) of base address: 2048

Minimum alignment (bytes) for any datatype: 128

Single precision floating point capability

Denorms: No

Quiet NaNs: Yes

Round to nearest even: Yes

Round to zero: Yes

Round to +ve and infinity: Yes

IEEE754-2008 fused multiply-add: Yes

Cache type: None

Cache line size: 0

Cache size: 0

Global memory size: 1019215872

Constant buffer size: 65536

Max number of constant args: 8

Local memory type: Scratchpad

Local memory size: 32768

Kernel Preferred work group size multiple: 64

Error correction support: 0

Unified memory for Host and Device: 0

Profiling timer resolution: 1

Device endianess: Little

Available: Yes

Compiler available: Yes

Execution capabilities:

Execute OpenCL kernels: Yes

Execute native function: No

Queue properties:

Out-of-Order: No

Profiling : Yes

Platform ID: 0x00007f3c48b2d8f0

Name: Turks

Vendor: Advanced Micro Devices, Inc.

Device OpenCL C version: OpenCL C 1.2

Driver version: 1729.3

Profile: FULL_PROFILE

Version: OpenCL 1.2 AMD-APP (1729.3)

Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event

Device Type: CL_DEVICE_TYPE_CPU

Vendor ID: 1002h

Board name:

Max compute units: 4

Max work items dimensions: 3

Max work items[0]: 1024

Max work items[1]: 1024

Max work items[2]: 1024

Max work group size: 1024

Preferred vector width char: 16

Preferred vector width short: 8

Preferred vector width int: 4

Preferred vector width long: 2

Preferred vector width float: 8

Preferred vector width double: 4

Native vector width char: 16

Native vector width short: 8

Native vector width int: 4

Native vector width long: 2

Native vector width float: 8

Native vector width double: 4

Max clock frequency: 1100Mhz

Address bits: 64

Max memory allocation: 5339882086

Image support: Yes

Max number of images read arguments: 128

Max number of images write arguments: 64

Max image 2D width: 8192

Max image 2D height: 8192

Max image 3D width: 2048

Max image 3D height: 2048

Max image 3D depth: 2048

Max samplers within kernel: 16

Max size of kernel argument: 4096

Alignment (bits) of base address: 1024

Minimum alignment (bytes) for any datatype: 128

Single precision floating point capability

Denorms: Yes

Quiet NaNs: Yes

Round to nearest even: Yes

Round to zero: Yes

Round to +ve and infinity: Yes

IEEE754-2008 fused multiply-add: Yes

Cache type: Read/Write

Cache line size: 64

Cache size: 16384

Global memory size: 5620928512

Constant buffer size: 65536

Max number of constant args: 8

Local memory type: Global

Local memory size: 32768

Kernel Preferred work group size multiple: 1

Error correction support: 0

Unified memory for Host and Device: 1

Profiling timer resolution: 1

Device endianess: Little

Available: Yes

Compiler available: Yes

Execution capabilities:

Execute OpenCL kernels: Yes

Execute native function: Yes

Queue properties:

Out-of-Order: No

Profiling : Yes

Platform ID: 0x00007f3c48b2d8f0

Name: AMD A8-4555M APU with Radeon(tm) HD Graphics

Vendor: AuthenticAMD

Device OpenCL C version: OpenCL C 1.2

Driver version: 1729.3 (sse2,avx,fma4)

Profile: FULL_PROFILE

Version: OpenCL 1.2 AMD-APP (1729.3)

Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

nirv_knox · ‎09-06-2015

Found it! After spending a sleepless night trying to solve the problem, I found the solution at 5 a.m. in the morning.

Apparently, use the following command lists down the environment variables supported by the GPU. In my case the "Turks" a.k.a. Radeon HD 7550M/7600M doesn't support GPU_MAX_ALLOC_PERCENT.

strings /usr/lib/libamdocl64.so | grep GPU

The above command displays the following list:

raptor@raptor-535U4C:~$ strings /usr/lib/libamdocl64.so | grep GPU

DEBUG_GPU_FLAGS

GPU_MAX_COMMAND_QUEUES

GPU_MAX_WORKGROUP_SIZE

GPU_MAX_WORKGROUP_SIZE_2D_X

GPU_MAX_WORKGROUP_SIZE_2D_Y

GPU_MAX_WORKGROUP_SIZE_3D_X

GPU_MAX_WORKGROUP_SIZE_3D_Y

GPU_MAX_WORKGROUP_SIZE_3D_Z

GPU_DEVICE_NAME

GPU_DEVICE_ORDINAL

GPU_INITIAL_HEAP_SIZE

GPU_MAX_HEAP_SIZE

GPU_HEAP_GROWTH_INCREMENT

GPU_STAGING_BUFFER_SIZE

GPU_DUMP_BLIT_KERNELS

GPU_BLIT_ENGINE_TYPE

GPU_FLUSH_ON_EXECUTION

GPU_USE_SYNC_OBJECTS

GPU_OPEN_VIDEO

GPU_PRE_RA_SCHED

GPU_PINNED_XFER_SIZE

GPU_PINNED_MIN_XFER_SIZE

GPU_RESOURCE_CACHE_SIZE

GPU_ASYNC_MEM_COPY

GPU_FORCE_64BIT_PTR

GPU_FORCE_OCL20_32BIT

GPU_RAW_TIMESTAMP

GPU_PARTIAL_DISPATCH

GPU_NUM_MEM_DEPENDENCY

GPU_XFER_BUFFER_SIZE

GPU_IMAGE_DMA

GPU_SINGLE_ALLOC_PERCENT

GPU_NUM_COMPUTE_RINGS

GPU_WORKLOAD_SPLIT

GPU_USE_SINGLE_SCRATCH

GPU_TARGET_INFO_ARCH

GPU_SPLIT_LIB

GPU_STAGING_WRITE_PERSISTENT

GPU_HSAIL_ENABLE

GPU_ASSUME_ALIASES

GPU_PRINT_CHILD_KERNEL

GPU_DIRECT_SRD

GPU_USE_DEVICE_QUEUE

GPU_ENABLE_LARGE_ALLOCATION

GPU_IFH_MODE

GPU_FORCE_SINGLE_FP_DENORM

GPU_ENABLE_HW_DEBUG

Virtual GPU List Ops Lock

GPU heap lock

Virtual GPU execution lock

ADL2_Display_PowerXpressActiveGPU_Get

ADL2_Display_PowerXpressActiveGPU_Set

uki_firegl_QueryGPUMapInfo

GPGPU, hw cannot provide thread-id

GPGPU, hw cannot support double-fp or memory export

GPGPU, exceed the local data storage limit

SH_MEM_ADDRESS_MODE_GPUVM64

SH_MEM_ADDRESS_MODE_GPUVM32

Specify that UAVs per pointer should be used(HD5XXX and HD6XXX series GPU's only).

Generate 64-bit ELF binary for GPU (default: 32-bit)

Enable/disable float f/c ==> f * (1.0f/c) for GPU (default : on)

Disabling (-fno-inline) GPU inlining for testing

-D__GPU__=1

From the above, setting the variable GPU_SINGLE_ALLOC_PERCENT to the desired percentage, changed the value of Max memory allocation accordingly. After running clinfo, the new Max memory allocation changed to

Max memory allocation: 515375104
Global memory size: 1030750208

The Global memory size set via GPU_MAX_HEAP_SIZE=96, whereas the Max memory allocation is set through GPU_SINGLE_ALLOC_PERCENT=50, which sets the buffer size at 50% of the Global memory size.

Please note that the variables vary from hardware to hardware. I am using AMDAPPSDK-3.0.0-Beta along with fglrx-15.20. Once again, please check your hardware supported variables with:

:~$ strings /usr/lib/libamdocl64.so | grep GPU

For some devices, it may be GPU_MAX_ALLOC_PERCENT, whereas it could be something else for others. The variables shown are undocumented. Fool around at your own risk.

UPDATE:

Please note that the variables' functionality also depends upon the version of AMD APP SDK. While GPU_SINGLE_ALLOC_PERCENT worked in AMD APP SDK 3.0-Beta, it does not work for AMD APP SDK 3.0. However, GPU_MAX_ALLOC_PERCENT=100 works in case of version 3.0.

View solution in original post

nirv_knox · ‎09-06-2015