15 Replies Latest reply on Mar 11, 2011 3:38 PM by MicahVillmow

    Controlling calclCompile() behavior

    empty_knapsack
      u32SCOptions

      Looking at ISA output I've found the following:

      u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC
      u32SCOptions[1] = 0x00202000 SCOption_R600_ERROR_ON_DOUBLE_MEMEXP SCOption_SET_VPM_FOR_SCATTER
      u32SCOptions[2] = 0x00000040 SCOption_R800_UAV_NONUAV_SYNC_WORKAROUND_BUG216513_1

       Looking further at aticaldd.dll itself I've found much more SCOptions available. The question is -- is it possible to turn on these options for calclCompile() invocation? If it's possible -- how, if not -- why?

      I'm personally very interested in SCOption_KEEP_CALLS hoping it'll prevent calls from being inlined (like it was long time ago at 9.9). Any ideas?

        • Controlling calclCompile() behavior
          MicahVillmow
          empty_knapsack,
          there is currently no interface to provide options to the shader compiler. It is something that we are working on and I am hoping to add it to an upcoming release.
          • Controlling calclCompile() behavior
            jch

            subroutine(call) is kept @ 9.9 ?

            But, subroutine is inlined @ kernel analyzer 1.6.721/option : CAL 9.9.

              • Controlling calclCompile() behavior
                empty_knapsack

                ATI compiler is simply unpredictable. For my kernels when compiling for PS mode it _sometimes_ doesn't inlines functions with Catalyst 9.9. The same kernel compiled for CS mode (only vObjIndex calculation changed from PS) got everything inlined (and performance dropped several times).

                Since 9.9 it's no more possible to forbid inlining, no matter PS or CS mode you're using.

                 

                The best possible solution -- write your own assembler, obviously it'll take way too much time but it's the only way to get code you're want to get, doesn't looks like ATI will change anything in nearest years.

                 

                  • Controlling calclCompile() behavior
                    jch

                    Thanks a lot, empty_knapsack!!

                    I'll try to reverse aticalcl.dll.

                    If I fail to injection SCOption_KEEP_CALL, then I'll try to assembly....

                      • Controlling calclCompile() behavior
                        empty_knapsack

                        I doubt it's possible to hack aticaldd.dll to make it produce non-inlined code. Looks like SCOptions selected at compile stage, so required code simply absent in DLL.

                         

                        Anyway, I was kinda ironic about own assembler -- it'll take tremendous amount of time to make one and without payed for it it looks like as absolutely pointless thing for me.

                        Much easier to switch to NVIDIA GPUs, yes, they have lower peak performance but it takes minutes to get your code working instead of hours/days for ATI kernels.

                          • Controlling calclCompile() behavior
                            jch

                            It's possible. I solved already.

                            one function @aticaldd.dll  sets SCoptions.

                            SCOption_KEEP_CALL is SCOption[1]=0x00000400.

                            I patched aticaldd.dll, about 10 bytes.

                            --------------------------

                            or dword ptr [esi+4], 400h

                            jmp short 0x....

                            -----------------------------

                            I got a call/ret asm.

                            I'll publish dll file soon.

                             

                            P.S. Thanks, empty_knapsack!!

                            your idea(SCOptions_KEEP_CALL) is helpful for me!! 

                              • Controlling calclCompile() behavior
                                empty_knapsack

                                Probably posting hacked .dll isn't good at all from legal point of view .

                                  • Controlling calclCompile() behavior
                                    empty_knapsack

                                    Well, I've made quick tests with 10.8 DLLs and forced SCOption_KEEP_CALL. The kernel size is indeed small and call/rets presents, however compilation time is just milliseconds for my kernels while it takes minutes with 9.9. So either ATI compiler become that good year after (but this functionality still isn't available without hacks) or generated code differs from 9.9, ofc in a bad way because of compilation time priority over code efficiency generation.

                                     

                                    Tests and benchmarks needed to say for sure.

                                      • Controlling calclCompile() behavior
                                        empty_knapsack

                                        Made more tests and benchmarks. For my specific kernels results with forced SCOption_KEEP_CALL enabled looks quite good. Now I'm kinda disappointed that I haven't done final step (i.e. actually hack DLL) after discovered that SCOption_KEEP_CALL exists and can be tweaked. Probably I was too distracted by "upcoming release" refrain and decided that such functionality totally disabled within DLL... I was wrong.

                                        Anyway, results from 10.8 with enabled SCOption_KEEP_CALL looks way better than 10.8 without SCOption_KEEP_CALL while 9.9 still produce the fastest code. However 9.9 takes much longer time to compile (but using lesser number of GPRs). Small table with results (two kernels, one with bitalign for 5XXX, one without):

                                        {Catalyst version} {Execution time at 5770} {Execution time at 4770} {GPRs used/Codelen at 5770} {GPRs used/Codelen at 4770}

                                        9.9 5.9s 10.7s 48/17600 48/21824
                                        10.8/patched 6.2s 11.2s 55/17472 54/21696
                                        10.8 25.7s 24.9s 113/217408 79/312512

                                        So unpatched 10.8 at 5770 works 4x slower than it could. I really curious why compiler switches aren't exposed yet at CAL/OpenCL level especially now knowing that they're available at least since 10.2 and compile time became quite reasonable to perform even at run-time stage.

                                         

                          • Controlling calclCompile() behavior
                            MicahVillmow
                            Other items are higher priority at this time, so it is not likely to happen.
                            • Controlling calclCompile() behavior
                              MicahVillmow
                              hazeman,
                              We are working on that, but the design of IL makes allowing function calls extremely difficult to get correct.
                              For example the following kernel:
                              float4 image_read_and_compute(image2d_t type) {
                              read image, do something and return result
                              }

                              kernel void image_compute(image2d_t type1, image2d type2, global float4 *val) {
                              val[get_global_id(0)] = image_read_and_compute(type1) * image_read_and_compute(type2);
                              }


                              Since the resource ID/sampler ID of the image is in the instruction, generating code correctly for image_read_and_compute is difficult.

                              We are moving closer to a solution, but it isn't easy to get correct.