AnsweredAssumed Answered

Radeon Pro SSG OpenGL API fails to read files

Question asked by friessfn on Sep 26, 2018
Latest reply on Oct 16, 2018 by xhuang



we are using the Radeon Pro SSG[0] to read files from the SSG into OpenGL Buffers of large size (e.g. SSBOs with size 1 GiB, up to the maximum size of 2147483647 bytes (~2047.99 MiB) per SSBO).

We encounter several bugs when using SSG functionality, namely: (1) reading the end of files on SSG, (2) uninvolved GL buffers seem to interfere with reading files from SSG, (3) ensuring asynchronous operations on SSG files.

You can find a Visual Studio Solution with a working minimal example and system information below.

The example code works on two example files with sizes of 1024 and 1008 bytes, which are attached for convenience, although their content does not matter.

The code expects the SSG to be the "G:\" drive on windows, but this can be overriden via command line argument (e.g. executing ' .\miniSSG.exe H:\' in PowerShell instead of '.\miniSSG.exe' to load files from H:\).


** (1) Reading the end of files
We encounter bugs with the OpenGL SSG Extensions when reading the end of a file via glReadFileAMD.
The SSG User Manual[1] requires reads to the end of a file to be aligned to a given block size.

Quote from the Manual: "If the file size is not a multiple of the block size, read the end of the file by aligning the read size with the next block multiple beyond the file size."

But we simply can not get glReadFileAMD to read the end of a file because the GL driver reports GL_INVALID_VALUE for any combination of function parameters reading the end of the file, unless the file itself has a size that is a multiple of the block size.
However, inflating our input files to a multiple of the SSG block size is not a practical solution.
So there seems to be a bug in the driver when reading the end of files from the SSG?

In Code:

GLuint dstBuffer = makeBuffer();

GLFileHandleAMD fileHandle = openFile(); // fileSize is not multiple of block size

glReadFileAMD(dstBuffer, fileHandle, 0/*bufferOffset*/, 0/*fileOffset*/, fileSize/*read size*/, nullptr/*GLsync*/); // => GL_INVALID_VALUE

** (2) Uninvolved GL buffers interfere with glReadFileAMD
The User Manual suggests to create GL buffers "using glNamedBufferStorage with the GL_MAP_READ_BIT | GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT flag set" for best performance and possibility of asynchronous reads of files.
A bug arises when a spare OpenGL Buffer Object is created with this method, but not used with glReadFileAMD to read from files.
When such a buffer of a certain (small or large) size exists, it seems to interfere with the functionallity of following glReadFileAMD operations, leading to GL_INVALID_OPERATION errors - altough those glReadFileAMD operations operate on a different buffer object.

In Code:


GLuint unusedBuffer = makeBuffer(flags); // buffer size is 513 bytes

GLuint dstBuffer = makeBuffer(flags); // buffer size is 1024 bytes

GLFileHandleAMD fileHandle = openFile();

glReadFileAMD(dstBuffer, fileHandle, 0/*bufferOffset*/, 0/*fileOffset*/, acceptableReadSize/*read size*/, nullptr/*GLsync*/); // => GL_INVALID_OPERATION


When buffers are created WITHOUT the GL_MAP_PERSISTENT_BIT, no GL_INVALID_OPERATION errors occur. When 'unusedBuffer' is created with size of 512 bytes, no errors occur.

** (3) Async SSG file read on large GL buffers
The User Manual states that file "read/write operations work in asynchronous mode" on GL buffers when using the bit flags given above, such that "the buffer is created in local visible video memory".
Since local visible video memory is only a few hundred MB, our SSBO allocations of 1-2 GiB will not fit into that, especially when we allocate several such buffers.

Quote (Manual pages 14+15):
"Access to local visible memory enables the highest performance, but this memory is only 256 MB and the system reserves most of it.
The application can only allocate about 100 MB; attempts to allocate more than the unallocated local visible memory will fail.
… Only when the buffer is created in local visible video memory (using glNamedBufferStorage with the GL_MAP_READ_BIT | GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT flag set)
will read/write operations work in asynchronous mode. Otherwise, the driver will ignore the sync object. "


So it is clear that large buffers will not lead to best pferformance, but the important question is: do we still get asynchronous file read/write operations for buffers of size 1-2 GiB?
The wording in the User Manual suggests that only (small) buffers in local visible memory get asynchronous mode.
But when allocating large buffers with the bit flag combination for local visible memory, buffer creation does not fail. So do we get async mode for such buffers?



Florian Frieß


Visualization Research Center (VISUS)

University of Stuttgart


[1] pages 12+13, 14+15

=== System Information

-> Overview
Radeon Pro und AMD FirePro Software-Version - 18.Q3
Radeon Pro und AMD FirePro Software Edition - Radeon Pro Software Enterprise Edition
Grafik-Chipsatz - Radeon Pro SSG
Größe des High Bandwidth-Cache - 16368 MB
Art des High Bandwidth-Cache - HBM2
Systemtaktrate - 1500 MHz
Windows-Version - Windows 10 (64 bit)
Systemspeicher - 32 GB
CPU-Typ - AMD Ryzen 7 1800X Eight-Core Processor

-> Software
Version der Radeon Pro- und AMD FirePro-Einstellungen - 2018.0814.1443.24654
Treiber-Paketversion -
Anbieter - Advanced Micro Devices, Inc.
2D-Treiberversion -
Direct3D® Version -
OpenGL® Version - 24.20.11000.14565
OpenCL™ Version - 24.20.12024.3003
AMD Mantle-Version - Not Available
AMD Mantle API-Version - Not Available
AMD Audio-Treiberversion -
Vulkan™ Driver Version - 2.0.33
Vulkan™ API Version - 1.1.73

-> Hardware
Grafikkarten-Hersteller - Designed and built by AMD
Grafik-Chipsatz - Radeon Pro SSG
Geräte-ID - 6862
Anbieter-ID - 1002
Subsystem-ID - 0B1E
Subsystem-Anbieter-ID - 1002
Revisions-ID - 00
Bustyp - PCI Express 3.0
Aktuelle Buseinstellungen - PCI Express 3.0 x16
BIOS-Version -
BIOS-Teilenummer - 113-D0690103-102
BIOS-Datum - 2017/09/21 16:12
Größe des High Bandwidth-Cache - 16368 MB
Art des High Bandwidth-Cache - HBM2
Taktrate des High Bandwidth-Cache - 945 MHz
Systemtaktrate - 1500 MHz
Bandbreite des High Bandwidth-Cache - 483 GByte/s
Speicherbitrate - 1.89 Gbps
2D-Treiberpfad - /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/Class/{4d36e968-e325-11ce-bfc1-08002be10318}/0001