cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

liwoog
Adept II

GPU_MAX_ALLOC_PERCENT and 13.1 drivers failure

While my code was running well in production using GPU_MAX_ALLOC_PERCENT at up to 100% with the 12.4 drivers, it fails (CL_OUT_OF_RESOURCES) with the 13.1 drivers (I allocate up to 90% of memory from the code). I tried changing 100% to 80% to no avail.

Only being able to use 2GB of the 3GB on the card would render it useless for my next project. I need every bit of memory I can use.

Is there a workaround?

Machine:

4x HD 7970

Catalyst 13.1 driver on CentOS 6.3


Operating System Version (name), Linux version 2.6.32-279.19.1.el6.centos.plus.x86_64 (mockbuild@c6b7.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Wed Dec 19 06:20:23 UTC 2012

Operating System Version (number), 2.6.32

Number Of Processors, 32

System Type, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz

Total Physical Memory, 64392 MB

Available Physical Memory, 62184 MB

Total Virtual Memory, 33554431 MB

Available Virtual Memory, 33519322 MB

Total Page Files, 8191 MB

Available Page Files, 8191 MB

Platform ID, 1, 1, 1, 1, 1

Device Type, GPU, GPU, GPU, GPU, CPU

Device Name, Tahiti, Tahiti, Tahiti, Tahiti, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz

Vendor, Advanced Micro Devices, Inc., Advanced Micro Devices, Inc., Advanced Micro Devices, Inc., Advanced Micro Devices, Inc., GenuineIntel

Command Queue Properties, Queue profiling, Queue profiling, Queue profiling, Queue profiling, Queue profiling

Is Available, Yes, Yes, Yes, Yes, Yes

Is Compiler Available, Yes, Yes, Yes, Yes, Yes

Is Little Endian, Yes, Yes, Yes, Yes, Yes

Error Correction Support, No, No, No, No, No

Execution Capabilities, Kernel Execution, Kernel Execution, Kernel Execution, Kernel Execution, Kernel Execution, Native Kernel Execution

Global Memory Cache Size, 16 KB, 16 KB, 16 KB, 16 KB, 32 KB

Memory Cache Type, Read Write, Read Write, Read Write, Read Write, Read Write

Global Memory Cache Line Size, 64 bytes, 64 bytes, 64 bytes, 64 bytes, 64 bytes

Global Memory Size, 2,048 MB, 2,048 MB, 2,048 MB, 2,048 MB, 64,393 MB

Host Unified Memory, No, No, No, No, Yes

Are Images Supported, Yes, Yes, Yes, Yes, Yes

Max Image 2D Dimensions, (256w, 256h), (256w, 256h), (256w, 256h), (256w, 256h), (1024w, 1024h)

Max Image 3D Dimensions, (256w, 256h, 256d), (256w, 256h, 256d), (256w, 256h, 256d), (256w, 256h, 256d), (1024w, 1024h, 1024d)

Local Memory Size, 32 KB, 32 KB, 32 KB, 32 KB, 32 KB

Local Memory Type, Local, Local, Local, Local, Global

Max Clock Frequency, 1050, 1050, 1050, 1050, 1200

Max Compute Units, 32, 32, 32, 32, 32

Max Constant Arguments, 8, 8, 8, 8, 8

Max Constant Buffer Size, 64 KB, 64 KB, 64 KB, 64 KB, 64 KB

Max Memory Allocation Size, 512 MB, 512 MB, 512 MB, 512 MB, 16,099 MB

Max Parameter Size, 1,024 bytes, 1,024 bytes, 1,024 bytes, 1,024 bytes, 4 KB

Read Image Arguments, 128, 128, 128, 128, 128

Max Samplers, 16, 16, 16, 16, 16

Max Workgroup Size, 256, 256, 256, 256, 1024

Max Work Item Dimensions, 3, 3, 3, 3, 3

Max Work Item Sizes, (256,256,256), (256,256,256), (256,256,256), (256,256,256), (1024,1024,1024)

Max Write Image Arguments, 8, 8, 8, 8, 8

Memory Base Address Alignment, 2048, 2048, 2048, 2048, 1024

Minimal Data Type Alignment Size, 128 bytes, 128 bytes, 128 bytes, 128 bytes, 128 bytes

OpenCL C Version, OpenCL C 1.2 , OpenCL C 1.2 , OpenCL C 1.2 , OpenCL C 1.2 , OpenCL C 1.2

Native Char Vector Width, 4, 4, 4, 4, 16

Native Short Vector Width, 2, 2, 2, 2, 8

Native Int Vector Width, 1, 1, 1, 1, 4

Native Long Vector Width, 1, 1, 1, 1, 2

Native Float Vector Width, 1, 1, 1, 1, 8

Native Double Vector Width, 1, 1, 1, 1, 4

Native Half Vector Width, 1, 1, 1, 1, 4

Preferred Char Vector Width, 4, 4, 4, 4, 16

Preferred Short Vector Width, 2, 2, 2, 2, 8

Preferred Int Vector Width, 1, 1, 1, 1, 4

Preferred Long Vector Width, 1, 1, 1, 1, 2

Preferred Float Vector Width, 1, 1, 1, 1, 8

Preferred Double Vector Width, 1, 1, 1, 1, 4

Preferred Half Vector Width, 1, 1, 1, 1, 4

Profile, FULL_PROFILE, FULL_PROFILE, FULL_PROFILE, FULL_PROFILE, FULL_PROFILE

Profiling Timer Resolution, 1, 1, 1, 1, 1

Vendor ID, OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2), OpenCL 1.2 AMD-APP (1113.2)

0 Likes
10 Replies
himanshu_gautam
Grandmaster

the output posted above looks like some modification of clinfo output. Can you share the source, it may help others as clinfo is having a issue when some platforms are OpenCL 1.1 and some are OpenCL 1.2 compliant.

I will ask the runtime guys and let you know if there is a way to enable the full memory. Can you check once with 12.10 driver(and 13.2 beta)? Do you still get 2GB out of 3GB memory for your tahiti cards. Thanks for reporting it.

0 Likes

This was a copy/paste from CodeXL system's info.

Here is the clinfo output below

Number of platforms:


1
  Platform Profile:


FULL_PROFILE
  Platform Version:


OpenCL 1.2 AMD-APP (1113.2)
  Platform Name:


AMD Accelerated Parallel Processing
  Platform Vendor:


Advanced Micro Devices, Inc.
  Platform Extensions:


cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

  Platform Name:


AMD Accelerated Parallel Processing
Number of devices:


5
  Device Type:



CL_DEVICE_TYPE_GPU
  Device ID:



4098
  Board name:



AMD Radeon HD 7900 Series
  Device Topology:


PCI[ B#2, D#0, F#0 ]
  Max compute units:


32
  Max work items dimensions:

3
    Max work items[0]:


256
    Max work items[1]:


256
    Max work items[2]:


256
  Max work group size:


256
  Preferred vector width char:

4
  Preferred vector width short:

2
  Preferred vector width int:

1
  Preferred vector width long:

1
  Preferred vector width float:

1
  Preferred vector width double:
1
  Native vector width char:

4
  Native vector width short:

2
  Native vector width int:

1
  Native vector width long:

1
  Native vector width float:

1
  Native vector width double:

1
  Max clock frequency:


1050Mhz
  Address bits:



32
  Max memory allocation:

536870912
  Image support:


Yes
  Max number of images read arguments:
128
  Max number of images write arguments:
8
  Max image 2D width:


16384
  Max image 2D height:


16384
  Max image 3D width:


2048
  Max image 3D height:


2048
  Max image 3D depth:


2048
  Max samplers within kernel:

16
  Max size of kernel argument:

1024
  Alignment (bits) of base address:
2048
  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms:



No
    Quiet NaNs:



Yes
    Round to nearest even:

Yes
    Round to zero:


Yes
    Round to +ve and infinity:

Yes
    IEEE754-2008 fused multiply-add:
Yes
  Cache type:



Read/Write
  Cache line size:


64
  Cache size:



16384
  Global memory size:


2147483648
  Constant buffer size:


65536
  Max number of constant args:

8
  Local memory type:


Scratchpad
  Local memory size:


32768
  Kernel Preferred work group size multiple: 64
  Error correction support:

0
  Unified memory for Host and Device:
0
  Profiling timer resolution:

1
  Device endianess:


Little
  Available:



Yes
  Compiler available:


Yes
  Execution capabilities:



    Execute OpenCL kernels:

Yes
    Execute native function:

No
  Queue properties:



    Out-of-Order:


No
    Profiling :



Yes
  Platform ID:



0x00007ffab08f64e0
  Name:




Tahiti
  Vendor:



Advanced Micro Devices, Inc.
  Device OpenCL C version:

OpenCL C 1.2
  Driver version:


1113.2 (VM)
  Profile:



FULL_PROFILE
  Version:



OpenCL 1.2 AMD-APP (1113.2)
  Extensions:



cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics

  Device Type:



CL_DEVICE_TYPE_GPU
  Device ID:



4098
  Board name:



AMD Radeon HD 7900 Series
  Device Topology:


PCI[ B#3, D#0, F#0 ]
  Max compute units:


32
  Max work items dimensions:

3
    Max work items[0]:


256
    Max work items[1]:


256
    Max work items[2]:


256
  Max work group size:


256
  Preferred vector width char:

4
  Preferred vector width short:

2
  Preferred vector width int:

1
  Preferred vector width long:

1
  Preferred vector width float:

1
  Preferred vector width double:
1
  Native vector width char:

4
  Native vector width short:

2
  Native vector width int:

1
  Native vector width long:

1
  Native vector width float:

1
  Native vector width double:

1
  Max clock frequency:


1050Mhz
  Address bits:



32
  Max memory allocation:

536870912
  Image support:


Yes
  Max number of images read arguments:
128
  Max number of images write arguments:
8
  Max image 2D width:


16384
  Max image 2D height:


16384
  Max image 3D width:


2048
  Max image 3D height:


2048
  Max image 3D depth:


2048
  Max samplers within kernel:

16
  Max size of kernel argument:

1024
  Alignment (bits) of base address:
2048
  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms:



No
    Quiet NaNs:



Yes
    Round to nearest even:

Yes
    Round to zero:


Yes
    Round to +ve and infinity:

Yes
    IEEE754-2008 fused multiply-add:
Yes
  Cache type:



Read/Write
  Cache line size:


64
  Cache size:



16384
  Global memory size:


2147483648
  Constant buffer size:


65536
  Max number of constant args:

8
  Local memory type:


Scratchpad
  Local memory size:


32768
  Kernel Preferred work group size multiple: 64
  Error correction support:

0
  Unified memory for Host and Device:
0
  Profiling timer resolution:

1
  Device endianess:


Little
  Available:



Yes
  Compiler available:


Yes
  Execution capabilities:



    Execute OpenCL kernels:

Yes
    Execute native function:

No
  Queue properties:



    Out-of-Order:


No
    Profiling :



Yes
  Platform ID:



0x00007ffab08f64e0
  Name:




Tahiti
  Vendor:



Advanced Micro Devices, Inc.
  Device OpenCL C version:

OpenCL C 1.2
  Driver version:


1113.2 (VM)
  Profile:



FULL_PROFILE
  Version:



OpenCL 1.2 AMD-APP (1113.2)
  Extensions:



cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics

  Device Type:



CL_DEVICE_TYPE_GPU
  Device ID:



4098
  Board name:



AMD Radeon HD 7900 Series
  Device Topology:


PCI[ B#-125, D#0, F#0 ]
  Max compute units:


32
  Max work items dimensions:

3
    Max work items[0]:


256
    Max work items[1]:


256
    Max work items[2]:


256
  Max work group size:


256
  Preferred vector width char:

4
  Preferred vector width short:

2
  Preferred vector width int:

1
  Preferred vector width long:

1
  Preferred vector width float:

1
  Preferred vector width double:
1
  Native vector width char:

4
  Native vector width short:

2
  Native vector width int:

1
  Native vector width long:

1
  Native vector width float:

1
  Native vector width double:

1
  Max clock frequency:


1050Mhz
  Address bits:



32
  Max memory allocation:

536870912
  Image support:


Yes
  Max number of images read arguments:
128
  Max number of images write arguments:
8
  Max image 2D width:


16384
  Max image 2D height:


16384
  Max image 3D width:


2048
  Max image 3D height:


2048
  Max image 3D depth:


2048
  Max samplers within kernel:

16
  Max size of kernel argument:

1024
  Alignment (bits) of base address:
2048
  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms:



No
    Quiet NaNs:



Yes
    Round to nearest even:

Yes
    Round to zero:


Yes
    Round to +ve and infinity:

Yes
    IEEE754-2008 fused multiply-add:
Yes
  Cache type:



Read/Write
  Cache line size:


64
  Cache size:



16384
  Global memory size:


2147483648
  Constant buffer size:


65536
  Max number of constant args:

8
  Local memory type:


Scratchpad
  Local memory size:


32768
  Kernel Preferred work group size multiple: 64
  Error correction support:

0
  Unified memory for Host and Device:
0
  Profiling timer resolution:

1
  Device endianess:


Little
  Available:



Yes
  Compiler available:


Yes
  Execution capabilities:



    Execute OpenCL kernels:

Yes
    Execute native function:

No
  Queue properties:



    Out-of-Order:


No
    Profiling :



Yes
  Platform ID:



0x00007ffab08f64e0
  Name:




Tahiti
  Vendor:



Advanced Micro Devices, Inc.
  Device OpenCL C version:

OpenCL C 1.2
  Driver version:


1113.2 (VM)
  Profile:



FULL_PROFILE
  Version:



OpenCL 1.2 AMD-APP (1113.2)
  Extensions:



cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics

  Device Type:



CL_DEVICE_TYPE_GPU
  Device ID:



4098
  Board name:



AMD Radeon HD 7900 Series
  Device Topology:


PCI[ B#-124, D#0, F#0 ]
  Max compute units:


32
  Max work items dimensions:

3
    Max work items[0]:


256
    Max work items[1]:


256
    Max work items[2]:


256
  Max work group size:


256
  Preferred vector width char:

4
  Preferred vector width short:

2
  Preferred vector width int:

1
  Preferred vector width long:

1
  Preferred vector width float:

1
  Preferred vector width double:
1
  Native vector width char:

4
  Native vector width short:

2
  Native vector width int:

1
  Native vector width long:

1
  Native vector width float:

1
  Native vector width double:

1
  Max clock frequency:


1050Mhz
  Address bits:



32
  Max memory allocation:

536870912
  Image support:


Yes
  Max number of images read arguments:
128
  Max number of images write arguments:
8
  Max image 2D width:


16384
  Max image 2D height:


16384
  Max image 3D width:


2048
  Max image 3D height:


2048
  Max image 3D depth:


2048
  Max samplers within kernel:

16
  Max size of kernel argument:

1024
  Alignment (bits) of base address:
2048
  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms:



No
    Quiet NaNs:



Yes
    Round to nearest even:

Yes
    Round to zero:


Yes
    Round to +ve and infinity:

Yes
    IEEE754-2008 fused multiply-add:
Yes
  Cache type:



Read/Write
  Cache line size:


64
  Cache size:



16384
  Global memory size:


2147483648
  Constant buffer size:


65536
  Max number of constant args:

8
  Local memory type:


Scratchpad
  Local memory size:


32768
  Kernel Preferred work group size multiple: 64
  Error correction support:

0
  Unified memory for Host and Device:
0
  Profiling timer resolution:

1
  Device endianess:


Little
  Available:



Yes
  Compiler available:


Yes
  Execution capabilities:



    Execute OpenCL kernels:

Yes
    Execute native function:

No
  Queue properties:



    Out-of-Order:


No
    Profiling :



Yes
  Platform ID:



0x00007ffab08f64e0
  Name:




Tahiti
  Vendor:



Advanced Micro Devices, Inc.
  Device OpenCL C version:

OpenCL C 1.2
  Driver version:


1113.2 (VM)
  Profile:



FULL_PROFILE
  Version:



OpenCL 1.2 AMD-APP (1113.2)
  Extensions:



cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_c1x_atomics

  Device Type:



CL_DEVICE_TYPE_CPU
  Device ID:



4098
  Board name:




  Max compute units:


32
  Max work items dimensions:

3
    Max work items[0]:


1024
    Max work items[1]:


1024
    Max work items[2]:


1024
  Max work group size:


1024
  Preferred vector width char:

16
  Preferred vector width short:

8
  Preferred vector width int:

4
  Preferred vector width long:

2
  Preferred vector width float:

8
  Preferred vector width double:
4
  Native vector width char:

16
  Native vector width short:

8
  Native vector width int:

4
  Native vector width long:

2
  Native vector width float:

8
  Native vector width double:

4
  Max clock frequency:


2601Mhz
  Address bits:



64
  Max memory allocation:

16880146432
  Image support:


Yes
  Max number of images read arguments:
128
  Max number of images write arguments:
8
  Max image 2D width:


8192
  Max image 2D height:


8192
  Max image 3D width:


2048
  Max image 3D height:


2048
  Max image 3D depth:


2048
  Max samplers within kernel:

16
  Max size of kernel argument:

4096
  Alignment (bits) of base address:
1024
  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms:



Yes
    Quiet NaNs:



Yes
    Round to nearest even:

Yes
    Round to zero:


Yes
    Round to +ve and infinity:

Yes
    IEEE754-2008 fused multiply-add:
Yes
  Cache type:



Read/Write
  Cache line size:


64
  Cache size:



32768
  Global memory size:


67520585728
  Constant buffer size:


65536
  Max number of constant args:

8
  Local memory type:


Global
  Local memory size:


32768
  Kernel Preferred work group size multiple: 1
  Error correction support:

0
  Unified memory for Host and Device:
1
  Profiling timer resolution:

1
  Device endianess:


Little
  Available:



Yes
  Compiler available:


Yes
  Execution capabilities:



    Execute OpenCL kernels:

Yes
    Execute native function:

Yes
  Queue properties:



    Out-of-Order:


No
    Profiling :



Yes
  Platform ID:



0x00007ffab08f64e0
  Name:




Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  Vendor:



GenuineIntel
  Device OpenCL C version:

OpenCL C 1.2
  Driver version:


1113.2 (sse2,avx)
  Profile:



FULL_PROFILE
  Version:



OpenCL 1.2 AMD-APP (1113.2)
  Extensions:



cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
0 Likes

With 12.8 and up, I can run with GPU_MAX_ALLOC_PERCENT set up to 45. Thankfully it also ups the global mem size to 3074424832, which is the important factor.

0 Likes

hi liwoog,

Can you explain how you are seeing 3074424832(2.86GB) out of the 3GB by setting GPU_MAX_ALLOC_PERCENT to 45?

Shouldn't setting it to 45 enable 45% of the GPU memory?

0 Likes

Simple..

          if (CL_SUCCESS != (err = clGetDeviceInfo(devices[0], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(cl_long), &maxGlobalMem, NULL)))

returns 3074424832.

The total memory seems to be dependent on the largest allocatable block:

setenv GPU_MAX_ALLOC_PERCENT 25

maxMemAllocSize(733741056), globalMemSize(2934964224)


setenv GPU_MAX_ALLOC_PERCENT 26

maxMemAllocSize(763090698), globalMemSize(3052362792)

setenv GPU_MAX_ALLOC_PERCENT 27

maxMemAllocSize(792440340), globalMemSize(3073376256)

0 Likes

Hi liwoog,

That is interesting to learn.


The GPU_MAX_ALLOC_PERCENT changes the maximum buffer size.  If you wants more GPU memory, you could try setting GPU_MAX_HEAP_SIZE to a value close to a 100 (say 95).

0 Likes

GPU_MAX_HEAP_SIZE on its own does not seem to impact the maximum amount of usable memory.

0 Likes

I also tried the flaf GPU_MAX_HEAP_SIZE, and it did not work for me on linux with 13.1 driver. As i understand, these features are only for experimental purpose, and AMD keeps the right to disable these flags in future.

Anyways I do understand your problem, and i have asked someone for a workaround. I will let you know if i hear from him

0 Likes
dipak
Big Boss

Hi,

I'm reviving this thread.

In this thread Large buffers, drallan described a workaround to allocate large memory. You may try and check whether it works for you or not.

Regards,

0 Likes
nirv_knox
Adept I

Use the following command and find out the list of environment variables supported by you GPU devices.

:~$ strings /usr/lib/libamdocl64.so | grep GPU

From the list of variables, you can set the percentages accordingly. You can also check if the variables GPU_MAX_HEAP_SIZE, GPU_MAX_ALLOC_PERCENT etc. are available or not. Also, you can check which variables can be tweaked for optimized usage of your hardware. E.g. if you have a card of global memory size of 1024 MB, then you might consider having two different buffers of sizes 512 MB each. Or you might consider have one buffer with full 1024 MB allotted. The choice is yours.

0 Likes