I'm just realized that my kernels compiled with Catalyst 9.3 (or 9.4?) no longer working with Catalyst 9.9.
By word "compiled" I mean sequence of calclCompile, calclLink, calclImageWrite & calclImageRead later when kernel needed.
As always with ATI software the error description is very informative "[Symbol "" used as OUTPUT does not have a memory association.]", it happens with calCtxRunProgram call. As it worked OK with all Catalyst versions prior to 9.9 (and of course I have no idea what's the "" symbol related to) first question is -- what's changed with newest version of Catalyst?
Second question is more complex. Is there any way to control calclCompile behavior especially for inlining functions? I mean literally any way including but not limited to running program with disassembler, put some breakpoints at aticaldd.dll and change registers on fly to forbid compiler inline anything! It takes hours (yeah, hours on 3Ghz system) to compile kernel with several calls and I have feeling that most of this time taken by unsuccessful attempts to inline things. And most funny thing that (for shorter kernels) when compiler finally generates inlined code it runs slower (than other bigger kernels with calls/rets) because code size reaches 350K and looks like RV770 simply cannot handle such huge kernels.
Anyone survived reading above? No? Sigh...
Are calls/rets THAT expensive for RV7XX hardware (so compiler trying to avoid them at all costs)? Looking on time estimations done with SKA it looks like 20-30% penalty for code with calls/rets, is it true?
Hi,
For the first question, did you use the indexed temp array, i.e. x[]?
Nope, no indexed arrays, only regular i/o. Kernel starts from:
il_ps_2_0
dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__
dcl_output_generic o0
dcl_output_generic o1
dcl_cb cb0[4]
dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(3)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(4)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx
sample_resource(1)_sampler(0) r1, vWinCoord0.xyxx
sample_resource(2)_sampler(0) r2, vWinCoord0.xyxx
sample_resource(3)_sampler(0) r3, vWinCoord0.xyxx
sample_resource(4)_sampler(0) r4, vWinCoord0.xyxx
So nothing special here (I think). Actually recompiling kernels with CAL runtime from Catalyst 9.9 making them valid but now I'll need to test them with previous Catalysts. And that's not counting the fact that I'll need several hours to finish compilation process.
Also with 9.9 my program starts to crash sometimes at aticaldd.dll + 0x9a53, no idea atm what caused this.
I'm totally dislike 9.9 now...
With 9.9 compiler starts to inline some kernels making them bigger 300K == performance dropped by 350%.
With 9.9 some changes been made in binary image format (no documentation of course what exactly changed) == calModuleLoad() can now crash when loading binaries compiled with previous versions. And if it won't crash we'll got that weird "[Symbol "" used as OUTPUT does not have a memory association.]" error.
Simply awesome.
Done. IL & C++ source, compiled .exe sent.
While fighting with CAL compiler I've found that calclAssembleObject works (there was thread here that calclAssembleObject totally broken). However it works only for targets < 8, for 8 & 9 (i.e. RV870 & 830) it returns error though error text is [No error]. Will calclAssembleObject work at all for HD5800 series? Or you planning to support only IL? As I start to think that it'll be easier to write in ISA from beginning rather than in IL and fighting with CAL compiler after it. However if there won't be assembler for HD5800 I doubt I'll invest that much time in ISA programming.
I see, no point at all to go ISA way.
Well, looks like I can only wait for next Catalysts hoping CAL compiler will become better.
Just noticed that with Catalyst 9.12 (CAL runtime 1.4.515) CAL compiler now aggressively unrolling everything. It already was performance problems when binary image exceeds 300K and now compiler not limited by size at all -- it producing 700K images in no time.
Obviously such kernels cannot fit in device "code" memory and performance dropped 3x-4x times.
I guess this change was made to compile OpenCL code (which can be very complex) in reasonable time. However, such performance drop cannot be called reasonable in any way. And looking at OpenCL forum there are still problems with compilation times.
If you'll notice strange performance drop with 9.12 -- check out the binary code size, may be you're facing this issue. Or some other, there are plenty of them around to face .