cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

timchist
Elite

Cannot make OpenCL runtime expose more than 3 GB of RAM

On GPUs having more than 3GB memory aboard it does not seem possible to make OpenCL runtime expose more than 3 GB of RAM.

Changing the GPU_MAX_HEAP_SIZE variable as described here only allows to decrease the amount of exposed RAM, but not to increase it.

For example, on an R9 290 having 4GB of memory setting GPU_MAX_HEAP_SIZE to 100 results in 3GB to become available. Changing the variable to 50 caused only 2GB to be exposed, so the variable is respected, but there seems to be an absolute maximum of 3GB.

R9 290X with 8GB RAM only allows to use up to 3GB as well both in 32-bit and 64-bit applications.

Why is it happening? Are there any plans to expose more memory in future versions of driver?

11 Replies
titanius
Adept II

Try to set GPU_FORCE_64BIT_PTR=1 Re: Can AMD opencl support 6GB of device memory?

With multiple 4GB GPUs, the newer driver seems to allow 4GB for the first card and 3.2GB for the rest of the GPUs. With the env variable, all cards should expose the full memory

nou
Exemplar

try set GPU_FORCE_64BIT_PTR=1 so runtime begin generate 64 bit kernels. Then you should be able to get more thank 3GB of RAM.

timchist
Elite

Thanks, titanius and nou -- I wish I could mark both answers as correct. Changing this variable indeed causes all memory to become exposed. However, it's still not clear what exactly the variable does.

For example, if the kernels are compiled offline (-fno-bin-source -fno-bin-llvmir -fno-bin-amdil -fbin-exe), do you need to set this variable at compile time or at run time?

What happens if the variable was set at compile time and is not set at run time (or the other way around)?

What happens if host application is a 32-bit one? (so sizeof(size_t) on CPU will be 4). What amount of memory will be returned, say, from an 8GB-GPU? 4GB?

0 Likes

Those are all good questions.

AMD GPUs much like CPUs have virtual to physical translation system and can operate either in 32bit and 64bit mode. While 64Bit address space enables access to a larger memory bank it may also degrade performance since pointer access requires double the clocks than in 32bit mode.

In OCL 1.2 :

  1.) By default GPU 32bit mode is enabled for all process types unless the environment variable above is specified.

  2.) There is a complete disjoint between the process bitness and the GPU bitness. Hence, 32 bit processes can run in GPU 64 bit mode and vice versa.

  3.)  When the runtime is exporting binaries it is also exporting all intermediate representations of the code.  If options or environment has changed when the binary is loaded the compiler library will silently recompile from the first convergent point.

In OCL 2.0 :

1.) Because of SVM , CPU bitness and GPU bitness are tightly coupled together. GPU 64 bit mode is enabled for 64 bit processes and GPU 32bit mode is enabled for 32bit process.

The mode of operation can be discovered by the application by calling clGetDeviceInfo with 'CL_DEVICE_ADDRESS_BITS' flag.

Thanks for jumping in, Tzachi.


What you are saying has been my initial assumption, however the tests I have conducted contradict it. Specifically, if I use offline compilation for a kernel with GPU_FORCE_64BIT_PTR = 0, this kernel (and in fact, our whole application) then works correctly both when the variable is set to 0 and to 1. The opposite seems to be true as well: if GPU_FORCE_64BIT_PTR is set to 1 during compilation, the application works correctly both when the variable is set to 0 and 1 at run time. I don't see any performance difference either.


For the reference: I was using Catalyst 14.9 and Radeon 7970 with 3 GB of RAM in these tests.

0 Likes

Hi timchist ,

I was inaccurate in my answer, see my fixed answer above.

0 Likes

Thank you for updating your answer, Tzachi.


When the runtime is exporting binaries it is also exporting all intermediate representations of the code.  If options or environment has changed when the binary is loaded the compiler library will silently recompile from the first convergent point.


We are compiling our kernels offline with the following parameters: -fno-bin-source -fno-bin-llvmir -fno-bin-amdil -fbin-exe. As far as I understand, in this case no intermediate representation is saved (neither LLVM IR, nor AMD IL), only binary. Source is also not included. Nevertheless, as I described in the previous post, the application works correctly when the value of GPU_FORCE_64BIT_PTR at compile time differs from the one in run time. Suggestions?

0 Likes

Try to look inside generated binaries with some viewer. Quite possibly you find  IL representation there along with machine codes. Those options are hints to runtime, it can ignore them and put all that it thinks needed inside "binary-only" file.

0 Likes

Thanks Raimster. Looked at the file, but it seems that it indeed only contains binaries and no source/intermediate code.

Below is the partial output or readelf for the binaries compiled with GPU_FORCE_64BIT_PTR = 0 and 1.

GPU_FORCE_64BIT_PTR = 0:


ELF Header:


  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00


  Class:                             ELF32


  Data:                              2's complement, little endian


  Version:                           1 (current)


  OS/ABI:                            UNIX - System V


  ABI Version:                       0


  Type:                              EXEC (Executable file)


  Machine:                           <unknown>: 0x3fd


  Version:                           0x1


  Entry point address:               0x0


  Start of program headers:          0 (bytes into file)


  Start of section headers:          466260 (bytes into file)


  Flags:                             0x0


  Size of this header:               52 (bytes)


  Size of program headers:           0 (bytes)


  Number of program headers:         0


  Size of section headers:           40 (bytes)


  Number of section headers:         7


  Section header string table index: 1



Section Headers:


  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al


  [ 0]                   NULL            00000000 000000 000000 00      0   0  0


  [ 1] .shstrtab         STRTAB          00000000 000034 000032 00   S  0   0  1


  [ 2] .strtab           STRTAB          00000000 000066 0011ed 00   S  0   0  1


  [ 3] .symtab           SYMTAB          00000000 001258 000bb0 10      2   0  8


  [ 4] .rodata           PROGBITS        00000000 001e08 00b223 00   A  0   0  1


  [ 5] .text             PROGBITS        00000000 00d02b 064ce8 00  AX  0   0  1


  [ 6] .comment          PROGBITS        00000000 071d13 00003e 00      0   0  1


Key to Flags:


  W (write), A (alloc), X (execute), M (merge), S (strings)


  I (info), L (link order), G (group), x (unknown)


  O (extra OS processing required) o (OS specific), p (processor specific)



There are no section groups in this file.



There are no program headers in this file.



There are no relocations in this file.



There are no unwind sections in this file.



Symbol table '.symtab' contains 187 entries:


   Num:    Value  Size Type    Bind   Vis      Ndx Name


     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND


     1: 00000000   408 OBJECT  LOCAL  DEFAULT    4 __OpenCL_Aj8kV1c_metadata


     2: 00000000  6326 FUNC    LOCAL  DEFAULT    5 __OpenCL_Aj8kV1c_kernel


     3: 00000198    32 OBJECT  LOCAL  DEFAULT    4 __OpenCL_Aj8kV1c_header


     4: 000001b8   405 OBJECT  LOCAL  DEFAULT    4 __OpenCL_F8szl1f_metadata


     5: 000018b6  6314 FUNC    LOCAL  DEFAULT    5 __OpenCL_F8szl1f_kernel


     6: 0000034d    32 OBJECT  LOCAL  DEFAULT    4 __OpenCL_F8szl1f_header


...


No version information found in this file.



GPU_FORCE_64BIT_PTR = 1:



ELF Header:


  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00


  Class:                             ELF64


  Data:                              2's complement, little endian


  Version:                           1 (current)


  OS/ABI:                            UNIX - System V


  ABI Version:                       0


  Type:                              EXEC (Executable file)


  Machine:                           <unknown>: 0x3fd


  Version:                           0x1


  Entry point address:               0x0


  Start of program headers:          0 (bytes into file)


  Start of section headers:          476384 (bytes into file)


  Flags:                             0x0


  Size of this header:               64 (bytes)


  Size of program headers:           0 (bytes)


  Number of program headers:         0


  Size of section headers:           64 (bytes)


  Number of section headers:         7


  Section header string table index: 1



Section Headers:


  [Nr] Name              Type             Address           Offset


       Size              EntSize          Flags  Link  Info  Align


  [ 0]                   NULL             0000000000000000  00000000


       0000000000000000  0000000000000000           0     0     0


  [ 1] .shstrtab         STRTAB           0000000000000000  00000040


       0000000000000032  0000000000000000   S       0     0     1


  [ 2] .strtab           STRTAB           0000000000000000  00000072


       00000000000011ed  0000000000000000   S       0     0     1


  [ 3] .symtab           SYMTAB           0000000000000000  00001260


       0000000000001188  0000000000000018           2     0     8


  [ 4] .rodata           PROGBITS         0000000000000000  000023e8


       000000000000b641  0000000000000000   A       0     0     1


  [ 5] .text             PROGBITS         0000000000000000  0000da29


       0000000000066a78  0000000000000000  AX       0     0     1


  [ 6] .comment          PROGBITS         0000000000000000  000744a1


       000000000000003e  0000000000000000           0     0     1


Key to Flags:


  W (write), A (alloc), X (execute), M (merge), S (strings)


  I (info), L (link order), G (group), x (unknown)


  O (extra OS processing required) o (OS specific), p (processor specific)



There are no section groups in this file.



There are no program headers in this file.



There are no relocations in this file.



There are no unwind sections in this file.



Symbol table '.symtab' contains 187 entries:


   Num:    Value          Size Type    Bind   Vis      Ndx Name


     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND


     1: 0000000000000000   425 OBJECT  LOCAL  DEFAULT    4 __OpenCL_Aj8kV1c_metadata


     2: 0000000000000000  6342 FUNC    LOCAL  DEFAULT    5 __OpenCL_Aj8kV1c_kernel


     3: 00000000000001a9    32 OBJECT  LOCAL  DEFAULT    4 __OpenCL_Aj8kV1c_header


     4: 00000000000001c9   422 OBJECT  LOCAL  DEFAULT    4 __OpenCL_F8szl1f_metadata


     5: 00000000000018c6  6330 FUNC    LOCAL  DEFAULT    5 __OpenCL_F8szl1f_kernel


     6: 000000000000036f    32 OBJECT  LOCAL  DEFAULT    4 __OpenCL_F8szl1f_header


...


No version information found in this file.


0 Likes

Hi Tzachi,


do you have any updates regarding my last post?


Thanks,

timchist

0 Likes
timchist
Elite

Does anyone have any information on how GPU_FORCE_64BIT_PTR and offline compilation are related and what exactly does this variable do?