cancel
Showing results for 
Search instead for 
Did you mean: 

OpenGL & Vulkan

mede
Journeyman III

Horrible OpenGL performance on new macbook (Nvidia to Radeon)

I recently got a new macbook pro to replace my 4 years old model. I was locking forward to get more performance for our virtual reality volume rendering project. But sadly the mostly all our shaders are around half as fast compared to the older nvidia card in the 2014 macbook.

Just reading the specs the Radeon card should be much faster: Radeon Pro 560 1024@907MHz / GeForce GT 750M 384@967MHz

I tried to find some of the larges performance gaps:

1. Volume Rotation

This simple fragment shader to rotate a 3d texture uses 30ms on the Nvidia card and 1500ms ! on the Radeon, which is 50 times more !!!

The shader is used together with a geometry shader and the OpenGL call glDrawArraysInstanced to perform the rotation slice wise at once.

Fragment

uniform sampler3D uCube;
uniform mat4 uTransform;
in vec3     gTexCoord;
out vec4    FragColor0;
void main()
{
        FragColor0 = texture(uCube, (uTransform*vec4(gTexCoord, 1.0)).xyz).rgba;
}

Geometry

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;
flat in int vInstanceID[3];
in vec2 vTexCoord[3];
out vec3 gTexCoord;
uniform int uInstanceScale;
void main(void)
{
    for (int i = 0; i < 3; ++i) {
        gl_Position = gl_in.gl_Position;
        gl_Layer = vInstanceID;
        gTexCoord = vec3(vTexCoord, (float(vInstanceID) + 0.5)/uInstanceScale);
        EmitVertex();
    }
}

Vertex

in vec3 Position;
in vec2 TexCoord0;
flat out int vInstanceID;
out vec2 vTexCoord;
void main()
{
        vTexCoord = TexCoord0;
        vInstanceID = gl_InstanceID;
        gl_Position = vec4(Position, 1.0);
}

2. 3D Texture Arrays

Another large drawback for our project is the missing feature of 3D Texture arrays on ATI GPUs.

The direct lookup is way faster than the switch workaround for ATI.

uniform sampler3D uTransmittance[NumSamplers];
vec4 getTransmittance(int index, vec3 pos) {
#ifdef VENDOR_ATI
    switch(index) {
        case 0: return texture(uTransmittance[0], pos);
        case 1: return texture(uTransmittance[1], pos);
        case 2: return texture(uTransmittance[2], pos);
        case 3: return texture(uTransmittance[3], pos);
        case 4: return texture(uTransmittance[4], pos);
        case 5: return texture(uTransmittance[5], pos);
        case 6: return texture(uTransmittance[6], pos);
        case 7: return texture(uTransmittance[7], pos);
    }
#else
    return texture(uTransmittance[index], pos);
#endif
}

3. Dynamic Branching

Also the "normal" lightning shader used for our 3D scene (simple room) are running way slower (half the speed). A large difference I found for a switch statement we use to calculate diffuse portal lightning.

For the NVidia hardware it makes no difference (speed wise), if I use the switch or just replace it with e.g. the source block of case 1... On the Radeon

the performance drop using the switch is very large (4 times slower)! Is there some optimisation needed for the radeon ?

void ltcClipQuadToHorizon(inout vec3 L[5], out int n) {
    int config = 0;
    if (L[0].z > 0.0) config += 1;
    if (L[1].z > 0.0) config += 2;
    if (L[2].z > 0.0) config += 4;
    if (L[3].z > 0.0) config += 8;
    switch (config) {
    case 0:
        n = 0;
        break;
    case 1:
        n = 3;
        L[1] = -L[1].z * L[0] + L[0].z * L[1];
        L[2] = -L[3].z * L[0] + L[0].z * L[3];
        L[3] = L[0];
        break;
    case 2:
        n = 3;
        L[0] = -L[0].z * L[1] + L[1].z * L[0];
        L[2] = -L[2].z * L[1] + L[1].z * L[2];
        L[3] = L[0];
        break;
    case 3:
        n = 4;
        L[2] = -L[2].z * L[1] + L[1].z * L[2];
        L[3] = -L[3].z * L[0] + L[0].z * L[3];
        L[4] = L[0];
        break;

 //until 15 cases following ...

Conclusion

There were already early concerns about Apple placing a AMD/ATI GPU into their pro models. As there is no DX on MacOS the GPU will only be used for OpenGL. Analysing now the Performance of this Radon model I am very disappointed. Not only special shaders maybe optimised for the GForce are lagging. All Shaders including simple once like doing standard texture rotation or gradient calculation are slower ;(

Is there anything we do completely wrong or is AMD just that bad with OpenGL ???

0 Likes
2 Replies

Hi, mede, i think that OpenGL driver for Mac OS X under developping by Apple, also Apple are stopped OpenGL support since 4.1 version, you can use Metal API or drop Mac OS X support and switch to Windows/BSD/Unix  with Vulkan API support !

But you can ask about it on www.develpper.apple.com, also i don't think that apple will says.

0 Likes

I think this is more an AMD than an Apple issue. Before I used an 2013 macbook with an Nvidia card which worked very good.

The 3D Texture Array feature is clearly a Nvidia extension therefore its clear this does not work with the radeon. But must of the the other shaders are very basic OpenGL version 3.2. I put both macbooks side by side (exact same system) and it is very obvious that OpenGL with the new having the radeon is slower.

0 Likes