cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Any way to control calclCompile behaviour?

And what changed with Catalyst 9.9?

I'm just realized that my kernels compiled with Catalyst 9.3 (or 9.4?) no longer working with Catalyst 9.9.

By word "compiled" I mean sequence of calclCompile, calclLink, calclImageWrite & calclImageRead later when kernel needed.

As always with ATI software the error description is very informative "[Symbol "" used as OUTPUT does not have a memory association.]", it happens with calCtxRunProgram call. As it worked OK with all Catalyst versions prior to 9.9 (and of course I have no idea what's the "" symbol related to) first question is -- what's changed with newest version of Catalyst?

 

 

Second question is more complex. Is there any way to control calclCompile behavior especially for inlining functions? I mean literally any way including but not limited to running program with disassembler, put some breakpoints at aticaldd.dll and change registers on fly to forbid compiler inline anything! It takes hours (yeah, hours on 3Ghz system) to compile kernel with several calls and I have feeling that most of this time taken by unsuccessful attempts to inline things. And most funny thing that (for shorter kernels) when compiler finally generates inlined code it runs slower (than other bigger kernels with calls/rets) because code size reaches 350K and looks like RV770 simply cannot handle such huge kernels.

 

Anyone survived reading above? No? Sigh...

0 Likes
12 Replies

empty_knapsack,
Compiler compilation performance is a known issue and should be improved with our next release. There also is no way to stop the compiler from inlining a function as it makes this decision based on the underlying platform and resources that the code is compiled for.
0 Likes

Are calls/rets THAT expensive for RV7XX hardware (so compiler trying to avoid them at all costs)? Looking on time estimations done with SKA it looks like 20-30% penalty for code with calls/rets, is it true?

0 Likes
the729
Journeyman III

Hi,

For the first question, did you use the indexed temp array, i.e. x[]?

0 Likes

Nope, no indexed arrays, only regular i/o. Kernel starts from:

il_ps_2_0
dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__
dcl_output_generic o0
dcl_output_generic o1
dcl_cb cb0[4]
dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(3)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(4)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)

sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx
sample_resource(1)_sampler(0) r1, vWinCoord0.xyxx
sample_resource(2)_sampler(0) r2, vWinCoord0.xyxx
sample_resource(3)_sampler(0) r3, vWinCoord0.xyxx
sample_resource(4)_sampler(0) r4, vWinCoord0.xyxx

So nothing special here (I think). Actually recompiling kernels with CAL runtime from Catalyst 9.9 making them valid but now I'll need to test them with previous Catalysts. And that's not counting the fact that I'll need several hours to finish compilation process.

0 Likes

Also with 9.9 my program starts to crash sometimes at aticaldd.dll + 0x9a53, no idea atm what caused this.

 

I'm totally dislike 9.9 now...

0 Likes

With 9.9 compiler starts to inline some kernels making them bigger 300K == performance dropped by 350%.

With 9.9 some changes been made in binary image format (no documentation of course what exactly changed) == calModuleLoad() can now crash when loading binaries compiled with previous versions. And if it won't crash we'll got that weird "[Symbol "" used as OUTPUT does not have a memory association.]" error.

Simply awesome.

0 Likes

empty_knapsack,
Is it possible for you to send us the IL so that we can see what is causing this issue? You can send it to streamdeveloper@amd.com and tell them to forward it to myself.
0 Likes

Also is it possible to send the binary?

Thanks,
0 Likes

Done. IL & C++ source, compiled .exe sent.

 

While fighting with CAL compiler I've found that calclAssembleObject works (there was thread here that calclAssembleObject totally broken). However it works only for targets < 8, for 8 & 9 (i.e. RV870 & 830) it returns error though error text is [No error]. Will calclAssembleObject work at all for HD5800 series? Or you planning to support only IL? As I start to think that it'll be easier to write in ISA from beginning rather than in IL and fighting with CAL compiler after it. However if there won't be assembler for HD5800 I doubt I'll invest that much time in ISA programming.

0 Likes

The assembler is not supported on HD5XXX. It's support on HD4XXX is spotty at best and we only really supported it for the HD2XXX and HD3XXX series of cards.
0 Likes

I see, no point at all to go ISA way.

 

Well, looks like I can only wait for next Catalysts hoping CAL compiler will become better.

0 Likes

Just noticed that with Catalyst 9.12 (CAL runtime 1.4.515) CAL compiler now aggressively unrolling everything. It already was performance problems when binary image exceeds 300K and now compiler not limited by size at all -- it producing 700K images in no time.

Obviously such kernels cannot fit in device "code" memory and performance dropped 3x-4x times.

I guess this change was made to compile OpenCL code (which can be very complex) in reasonable time. However, such performance drop cannot be called reasonable in any way. And looking at OpenCL forum there are still problems with compilation times.

 

If you'll notice strange performance drop with 9.12 -- check out the binary code size, may be you're facing this issue. Or some other, there are plenty of them around to face .

 

0 Likes