Hello,

I have download the new version of GSA 1.43.

I look some examples and i don't understand the result

First example

kernel void sum(float fN, float4 a<>, float4 b<>, out float4 c<>

{

float f = 0;

for (f = 0 ; f < **2044**; ++f)

{

c = a + b;

}

}

Radeon HD 3870,**3,2.00,2.80,2.27,2.00,0.50,2.40,0.42,2.80,0.36**,TEX,TEX,TEX,8.00,6.67,5.71,0

Second Example :

kernel void sum(float fN, float4 a<>, float4 b<>, out float4 c<>

{

float f = 0;

for (f = 0 ; f < **2045**; ++f)

{

c = a + b;

}

}

Radeon HD 3870,**4,2.00,63.75,13.91,13.91,6.95,13.91,5.80,13.91,4.97**,ALU,ALU,ALU,1.15,1.15,1.15,0

Why when i add one iteration the Disassembly change and the average increase and register too ???

Thanks in advance

J

Why we have this limit ?

I test this kernel in a new project with float :

kernel void sum( float a<>, float b<>, out float c<>

{

float f = 0;

for (f = 0 ; f <

4095; ++f){

c = a + b;

}

}

Time 0.123029With this kernel

kernel void sum(float a<>, float b<>, out float c<>

{

float f = 0;

for (f = 0 ; f <

4096; ++f){

c = a + b;

}

}

Time 0.244093with 4094 iterations , i have the same time that with 4095 iterations

In GSA, when i test those two kernel i have the same phenomenom with 3 registers to 6 registers ?????I belive they did it to keep compilation time reasonable, if there was no such limit a bug code or a latency test would take forever to compile.

And from your data it evalutes something from 8192 to 16384 instructions, wich is a lot for any constant in real life programs.

Same here, in the first case the compiler replaced your code by something like:

[code]

kernel void sum(float a<>, float b<>, out float c<>)

{

c = a + b;

}

[/code]

And only needed 3 GPRs for it, in the second it compiled the entire loop and need more registers.