cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jonathan81
Journeyman III

GPU ShaderAnalyser question

Hello,

I have download the new version of GSA 1.43.

I look some examples and i don't understand the result

First example

kernel void sum(float fN, float4 a<>, float4 b<>, out float4 c<>
{
    float f = 0;
    for (f = 0 ; f <  2044; ++f)
   {
       c = a + b;
    }
}

Radeon HD 3870,3,2.00,2.80,2.27,2.00,0.50,2.40,0.42,2.80,0.36,TEX,TEX,TEX,8.00,6.67,5.71,0

 Second Example :

kernel void sum(float fN, float4 a<>, float4 b<>, out float4 c<>
{
    float f = 0;
    for (f = 0 ; f <  2045; ++f)
   {
       c = a + b;
    }
}


Radeon HD 3870,4,2.00,63.75,13.91,13.91,6.95,13.91,5.80,13.91,4.97,ALU,ALU,ALU,1.15,1.15,1.15,0

Why when i add one iteration the Disassembly change and the average increase and register too ???

Thanks in advance

J

0 Likes
3 Replies
eduardoschardong
Journeyman III

Aparently the optimizer evaluted the first 2043 iterations and found they don't affect final result, so otimized they away, in the second case the optimizer was unable to go beyond 2044 interations so it coded the entire loop.
0 Likes

Why we have this limit ?

I test this kernel in a new project with float :

kernel void sum( float a<>, float b<>, out float c<>
{
    float f = 0;
    for (f = 0 ; f <  4095; ++f)
   {
       c = a + b;
    }
}

Time 0.123029

 

With this kernel

 

kernel void sum(float a<>, float b<>, out float c<>
{
    float f = 0;
    for (f = 0 ; f <  4096; ++f)
   {
       c = a + b;
    }
}

Time 0.244093

with 4094 iterations , i have the same time that with 4095 iterations

In GSA , when i test those two kernel i have the same phenomenom with 3 registers to 6 registers ?????

 

0 Likes

Originally posted by: jonathan81
Why we have this limit ?

I belive they did it to keep compilation time reasonable, if there was no such limit a bug code or a latency test would take forever to compile.

And from your data it evalutes something from 8192 to 16384 instructions, wich is a lot for any constant in real life programs.


Originally posted by: jonathan81
In GSA , when i test those two kernel i have the same phenomenom with 3 registers to 6 registers ?????




Same here, in the first case the compiler replaced your code by something like:
 
kernel void sum(float a<>, float b<>, out float c<>)
{
c = a + b;
}

And only needed 3 GPRs for it, in the second it compiled the entire loop and need more registers.
0 Likes