cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

empty_knapsack
Adept II

Controlling calclCompile() behavior

u32SCOptions

Looking at ISA output I've found the following:

u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC
u32SCOptions[1] = 0x00202000 SCOption_R600_ERROR_ON_DOUBLE_MEMEXP SCOption_SET_VPM_FOR_SCATTER
u32SCOptions[2] = 0x00000040 SCOption_R800_UAV_NONUAV_SYNC_WORKAROUND_BUG216513_1

 Looking further at aticaldd.dll itself I've found much more SCOptions available. The question is -- is it possible to turn on these options for calclCompile() invocation? If it's possible -- how, if not -- why?

I'm personally very interested in SCOption_KEEP_CALLS hoping it'll prevent calls from being inlined (like it was long time ago at 9.9). Any ideas?

Tags (1)
0 Likes
Reply
15 Replies
MicahVillmow
Staff
Staff

Controlling calclCompile() behavior

empty_knapsack,
there is currently no interface to provide options to the shader compiler. It is something that we are working on and I am hoping to add it to an upcoming release.
0 Likes
Reply
jch
Journeyman III

Controlling calclCompile() behavior

subroutine(call) is kept @ 9.9 ?

But, subroutine is inlined @ kernel analyzer 1.6.721/option : CAL 9.9.

0 Likes
Reply
empty_knapsack
Adept II

Controlling calclCompile() behavior

ATI compiler is simply unpredictable. For my kernels when compiling for PS mode it _sometimes_ doesn't inlines functions with Catalyst 9.9. The same kernel compiled for CS mode (only vObjIndex calculation changed from PS) got everything inlined (and performance dropped several times).

Since 9.9 it's no more possible to forbid inlining, no matter PS or CS mode you're using.

 

The best possible solution -- write your own assembler, obviously it'll take way too much time but it's the only way to get code you're want to get, doesn't looks like ATI will change anything in nearest years.

 

0 Likes
Reply
jch
Journeyman III

Controlling calclCompile() behavior

Thanks a lot, empty_knapsack!!

I'll try to reverse aticalcl.dll.

If I fail to injection SCOption_KEEP_CALL, then I'll try to assembly....

0 Likes
Reply
empty_knapsack
Adept II

Controlling calclCompile() behavior

I doubt it's possible to hack aticaldd.dll to make it produce non-inlined code. Looks like SCOptions selected at compile stage, so required code simply absent in DLL.

 

Anyway, I was kinda ironic about own assembler -- it'll take tremendous amount of time to make one and without payed for it it looks like as absolutely pointless thing for me.

Much easier to switch to NVIDIA GPUs, yes, they have lower peak performance but it takes minutes to get your code working instead of hours/days for ATI kernels.

0 Likes
Reply
jch
Journeyman III

Controlling calclCompile() behavior

It's possible. I solved already.

one function @aticaldd.dll  sets SCoptions.

SCOption_KEEP_CALL is SCOption[1]=0x00000400.

I patched aticaldd.dll, about 10 bytes.

--------------------------

or dword ptr [esi+4], 400h

jmp short 0x....

-----------------------------

I got a call/ret asm.

I'll publish dll file soon.

 

P.S. Thanks, empty_knapsack!!

your idea(SCOptions_KEEP_CALL) is helpful for me!! 

0 Likes
Reply
empty_knapsack
Adept II

Controlling calclCompile() behavior

Probably posting hacked .dll isn't good at all from legal point of view .

0 Likes
Reply
empty_knapsack
Adept II

Controlling calclCompile() behavior

Well, I've made quick tests with 10.8 DLLs and forced SCOption_KEEP_CALL. The kernel size is indeed small and call/rets presents, however compilation time is just milliseconds for my kernels while it takes minutes with 9.9. So either ATI compiler become that good year after (but this functionality still isn't available without hacks) or generated code differs from 9.9, ofc in a bad way because of compilation time priority over code efficiency generation.

 

Tests and benchmarks needed to say for sure.

0 Likes
Reply
empty_knapsack
Adept II

Controlling calclCompile() behavior

Made more tests and benchmarks. For my specific kernels results with forced SCOption_KEEP_CALL enabled looks quite good. Now I'm kinda disappointed that I haven't done final step (i.e. actually hack DLL) after discovered that SCOption_KEEP_CALL exists and can be tweaked. Probably I was too distracted by "upcoming release" refrain and decided that such functionality totally disabled within DLL... I was wrong.

Anyway, results from 10.8 with enabled SCOption_KEEP_CALL looks way better than 10.8 without SCOption_KEEP_CALL while 9.9 still produce the fastest code. However 9.9 takes much longer time to compile (but using lesser number of GPRs). Small table with results (two kernels, one with bitalign for 5XXX, one without):

{Catalyst version} {Execution time at 5770} {Execution time at 4770} {GPRs used/Codelen at 5770} {GPRs used/Codelen at 4770}

9.9 5.9s 10.7s 48/17600 48/21824
10.8/patched 6.2s 11.2s 55/17472 54/21696
10.8 25.7s 24.9s 113/217408 79/312512

So unpatched 10.8 at 5770 works 4x slower than it could. I really curious why compiler switches aren't exposed yet at CAL/OpenCL level especially now knowing that they're available at least since 10.2 and compile time became quite reasonable to perform even at run-time stage.

 

0 Likes
Reply