12 Replies Latest reply on Jan 5, 2010 5:59 PM by empty_knapsack

    Any way to control calclCompile behaviour?

    empty_knapsack
      And what changed with Catalyst 9.9?

      I'm just realized that my kernels compiled with Catalyst 9.3 (or 9.4?) no longer working with Catalyst 9.9.

      By word "compiled" I mean sequence of calclCompile, calclLink, calclImageWrite & calclImageRead later when kernel needed.

      As always with ATI software the error description is very informative "[Symbol "" used as OUTPUT does not have a memory association.]", it happens with calCtxRunProgram call. As it worked OK with all Catalyst versions prior to 9.9 (and of course I have no idea what's the "" symbol related to) first question is -- what's changed with newest version of Catalyst?

       

       

      Second question is more complex. Is there any way to control calclCompile behavior especially for inlining functions? I mean literally any way including but not limited to running program with disassembler, put some breakpoints at aticaldd.dll and change registers on fly to forbid compiler inline anything! It takes hours (yeah, hours on 3Ghz system) to compile kernel with several calls and I have feeling that most of this time taken by unsuccessful attempts to inline things. And most funny thing that (for shorter kernels) when compiler finally generates inlined code it runs slower (than other bigger kernels with calls/rets) because code size reaches 350K and looks like RV770 simply cannot handle such huge kernels.

       

      Anyone survived reading above? No? Sigh...

        • Any way to control calclCompile behaviour?
          MicahVillmow
          empty_knapsack,
          Compiler compilation performance is a known issue and should be improved with our next release. There also is no way to stop the compiler from inlining a function as it makes this decision based on the underlying platform and resources that the code is compiled for.
          • Any way to control calclCompile behaviour?
            the729

            Hi,

            For the first question, did you use the indexed temp array, i.e. x[]?

              • Any way to control calclCompile behaviour?
                empty_knapsack

                Nope, no indexed arrays, only regular i/o. Kernel starts from:

                il_ps_2_0
                dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__
                dcl_output_generic o0
                dcl_output_generic o1
                dcl_cb cb0[4]
                dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
                dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
                dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
                dcl_resource_id(3)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
                dcl_resource_id(4)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)

                sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx
                sample_resource(1)_sampler(0) r1, vWinCoord0.xyxx
                sample_resource(2)_sampler(0) r2, vWinCoord0.xyxx
                sample_resource(3)_sampler(0) r3, vWinCoord0.xyxx
                sample_resource(4)_sampler(0) r4, vWinCoord0.xyxx

                So nothing special here (I think). Actually recompiling kernels with CAL runtime from Catalyst 9.9 making them valid but now I'll need to test them with previous Catalysts. And that's not counting the fact that I'll need several hours to finish compilation process.

              • Any way to control calclCompile behaviour?
                MicahVillmow
                empty_knapsack,
                Is it possible for you to send us the IL so that we can see what is causing this issue? You can send it to streamdeveloper@amd.com and tell them to forward it to myself.
                • Any way to control calclCompile behaviour?
                  MicahVillmow
                  Also is it possible to send the binary?

                  Thanks,
                    • Any way to control calclCompile behaviour?
                      empty_knapsack

                      Done. IL & C++ source, compiled .exe sent.

                       

                      While fighting with CAL compiler I've found that calclAssembleObject works (there was thread here that calclAssembleObject totally broken). However it works only for targets < 8, for 8 & 9 (i.e. RV870 & 830) it returns error though error text is [No error]. Will calclAssembleObject work at all for HD5800 series? Or you planning to support only IL? As I start to think that it'll be easier to write in ISA from beginning rather than in IL and fighting with CAL compiler after it. However if there won't be assembler for HD5800 I doubt I'll invest that much time in ISA programming.

                    • Any way to control calclCompile behaviour?
                      MicahVillmow
                      The assembler is not supported on HD5XXX. It's support on HD4XXX is spotty at best and we only really supported it for the HD2XXX and HD3XXX series of cards.
                        • Any way to control calclCompile behaviour?
                          empty_knapsack

                          I see, no point at all to go ISA way.

                           

                          Well, looks like I can only wait for next Catalysts hoping CAL compiler will become better.

                            • Any way to control calclCompile behaviour?
                              empty_knapsack

                              Just noticed that with Catalyst 9.12 (CAL runtime 1.4.515) CAL compiler now aggressively unrolling everything. It already was performance problems when binary image exceeds 300K and now compiler not limited by size at all -- it producing 700K images in no time.

                              Obviously such kernels cannot fit in device "code" memory and performance dropped 3x-4x times.

                              I guess this change was made to compile OpenCL code (which can be very complex) in reasonable time. However, such performance drop cannot be called reasonable in any way. And looking at OpenCL forum there are still problems with compilation times.

                               

                              If you'll notice strange performance drop with 9.12 -- check out the binary code size, may be you're facing this issue. Or some other, there are plenty of them around to face .