<topic 1>
subroutine kernel error
-----------
kernel void helper(float a, float b, out float c)
{
c=a+b;
}
kernel void sum(float a<>, float b<>, out float c<>
{
helper(a,b,c);
}
----------------------------
error msg : Not support kernel local\parameter data structure type:non-reduce output parameter must be stream type.
how to use sub-soutine?
<topic 2>
I tested sha1 function as subroutine.
when I compile IL to ISA, sha1 subroutine is compiled as inline function.
------------------------------------------
mov r301, r1011
mov r302, r1012
call 38 ; sha1 function
mov r1013, r303
mov r1014, r304
....
mov r301, r1011
mov r302, r1012
call 38 ; sha1 function
mov r1013, r303
mov r1014, r304..
....repeat
--------------------------------------------
inlined sha1 code asm (Not calling)
inlined sha1 code asm (Not calling)
--------------------------------------------
IL code use call sha1, but asm code is assembled as inline code.
but i think that large code is slower than short code.
i want that sha1 subroutine is not inline function.
Can i control? is there a option for compile?
or another trick?
Originally posted by: jch subroutine kernel error
-----------
kernel void helper(float a, float b, out float c)
{
c=a+b;
}
kernel void sum(float a<>, float b<>, out float c<>
{
helper(a,b,c);
}
----------------------------
error msg : Not support kernel local\parameter data structure type:non-reduce output parameter must be stream type.
how to use sub-soutine?
I tested sha1 function as subroutine. when I compile IL to ISA, sha1 subroutine is compiled as inline function. ------------------------------------------ mov r301, r1011 mov r302, r1012 call 38 ; sha1 function mov r1013, r303 mov r1014, r304 .... mov r301, r1011 mov r302, r1012 call 38 ; sha1 function mov r1013, r303 mov r1014, r304.. ....repeat --------------------------------------------
inlined sha1 code asm (Not calling)
inlined sha1 code asm (Not calling)
--------------------------------------------
IL code use call sha1, but asm code is assembled as inline code. but i think that large code is slower than short code. i want that sha1 subroutine is not inline function. Can i control? is there a option for compile? or another trick?
There is no way to control this. In GPGPU programming, subroutines are allways inlined as hardware has limited stack or no stack.
I already has seen a call/ret asm.
-----------------------------------
....
06 CALL CALL_COUNT(1) ADDR(100)
....
100 EXP_DONE..
END_OF_PROGRAM
101 ALU: ADDR(1000) CNT(100)
....
80 RET
-----------------------------------
call/ret asm is existed.
i miss noinline keyword
is there SCOption_KEEP_CALLS option at IL compiler?
how can i use?
Or subroutine(call) is kept @ 9.9 ?
But subroutine is inlined @ kernel analyzer 1.6.721/option : CAL 9.9.
http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=135833
I solved by myself.
I patched aticaldd.dll about 10 bytes.
I got Call/ret asm.
I'll publish soon.