11 Replies Latest reply on Mar 11, 2009 7:20 PM by ryta1203

    CAL and ISA question


      I don't get an output when using ISA but I do when using IL. For the IL I use calclCompile and for the ISA I use calclAssembleObject, otherwise, all the code is the same. The ISA was generated using KSA.



      const char ILkernel[] =


      "dcl_input_position_interp(linear) v0.x\n"

      "dcl_output_generic o0\n"


      "sample_resource(0)_sampler(0) r0, v0.x\n"

      "mul o0, r0, r0\n"














      ILkernel[] =

      "00 TEX: ADDR(48) CNT(1) VALID_PIX"

      " 0 SAMPLE R0.x___, R0.xyxx, t0, s0 UNNORM(XYZW) "

      "01 ALU: ADDR(32) CNT(2) "

      " 1 x: MUL_e R0.x, R0.x, R0.x "




      " y: MOV R0.y, 0.0f"




      "02 EXP_DONE: PIX0, R0.xyyy"



      There has to be something obviuos I am missing here!? Also, the KSA generated IL doesn't work either, which seems to be a problem since AMD says that you can use the KSA to generate your kernels from Brook+ and use the generated IL in CAL.








        • CAL and ISA question

          The ISA is incomplete.

          You are missing the header and footer information that is required.


          ShaderType = 3

          TargetChip = w

          ;SC Dep components

          NumClauseTemps = 4


          ; ----------------- CS Data ------------------------

          ; Input Semantic Mappings

          ;    No input mappings


          GprPoolSize = 0

          CodeLen                 = 0;Bytes

          PGM_END_CF              = 0; words(64 bit)

          PGM_END_ALU             = 0; words(64 bit)

          PGM_END_FETCH           = 0; words(64 bit)

          MaxScratchRegsNeeded    = 0

          ; texResourceUsage[0]     = 0x00000000

          ; texResourceUsage[1]     = 0x00000000

          ; texResourceUsage[2]     = 0x00000000

          ; texResourceUsage[3]     = 0x00000000

          ; fetch4ResourceUsage[0]  = 0x00000000

          ; fetch4ResourceUsage[1]  = 0x00000000

          ; fetch4ResourceUsage[2]  = 0x00000000

          ; fetch4ResourceUsage[3]  = 0x00000000

          ; texSamplerUsage         = 0x00000000

          ; constBufUsage           = 0x00000000

          ResourcesAffectAlphaOutput[0]  = 0x00000000

          ResourcesAffectAlphaOutput[1]  = 0x00000000

          ResourcesAffectAlphaOutput[2]  = 0x00000000

          ResourcesAffectAlphaOutput[3]  = 0x00000000


          ;SQ_PGM_RESOURCES        = 0x30000001

          SQ_PGM_RESOURCES:NUM_GPRS     = 1

          SQ_PGM_RESOURCES:STACK_SIZE           = 0



          CsSetupMode = Fast

          NumThreadPerGroup = 64

          NumWavefrontPerSIMD = 16

          IsMaxNumWavePerSIMD = Yes

          ; SetBufferForNumGroup = false

            • CAL and ISA question



              The generated IL from the KSA doesn't work either, the IL I was using earlier was hand generated by me, but the KSA generated version (which is much longer, unnecessarily) doesn't compile.

              Also, is what you posted covered in the docs? I'd like to use CAL but I just don't see a good reason to use the IL.

              Is what I am missing go in the kernel, I'm assuming not since the KSA didn't generate it? Also, some samples of CAL using the ISA would be great for a future release and a real time saver.

                • CAL and ISA question

                  Sections 7 & 8 of R600_Assembly_Language_Format.pdf specify the header options that must precede/follow ISA code. SDK 1.3 did not have the 7XX ISA information added in it but that is being worked on for 1.4.

                  Also, I'd strongly discourage using the ISA as it is card specific. I.e. ISA written for HD4850/4870 is not guaranteed to run on any other board and can quite easily hang your machine. If you must write in ISA for performance reasons, I'd recommend optimizing in the higher level languages as much as possible before even attempting to use the ISA. The Shader Compiler has built in knowledge about each specific graphics card and its differences and optimizes based on that information. For example, compile the same non-trivial IL shader for a RV710 and a RV770 and you should get two drastically different ISA's.

                    • CAL and ISA question

                      1. Thanks. I saw that in the ASM Lang Format doc.

                      2. There's a reason I want to code in ISA and it has nothing to do with specific program performance at the moment.

                      3. Unless there is a way to compile IL without optimizations or alterations (which I wouldn't even know about since I'd have to understand the CAL compiler totally to know this) then ISA is my only option.

                      4. I understand the ISA is card specific, that doesn't bother me.

                      5. AMD simply hasn't provided enough information for optimizations at the higher level languages to be of much use. It's almost more trial and error than anything and that's VERY time consuming.

                      6. Do you have any idea why IL header generated from KSA would not compile? I am also having that problem.

                        • CAL and ISA question

                          There currently is no way to disable optimizations and we are working on getting more information out to the public so that programmers can optimize their programs better.


                          As for SKA, it seems that it does not generate all of the header information required. I'd email them and ask them about it.

                            • CAL and ISA question


                                Thanks. You mean for the IL correct? I just want to make sure that the calclAssembleObject does no optimizations.

                                Thanks, I posted the request over on their forum, maybe they will read it.

                                • CAL and ISA question

                                  Also, the documented optimizations for Brook+ (higher level languages) work well but they only take you so far, which seems to be below expected results. Also, the optimizations don't work in all scenarios, only in some, for example there are scenarios where using float is better than float4 and vice-versa.

                                    • CAL and ISA question

                                      We have made a lot of performance improvements recently and writing a float4 and 4 floats should give equal performance on the hardware. The issue is more software issues as the hardware should give equivalent results for the two cases.

                                        • CAL and ISA question

                                          FYI, I found why the copy and paste method from the SKA didn't work:

                                          The SKA doesn't add the "\n" apparently needed at the end of each instr line. I have requested that they add this.

                                            • CAL and ISA question

                                              So, using amuasm and amudisasm I got some ISA, but I am getting an error when calling calclAssembleObject(&obj, CAL_PROGRAM_TYPE_PS, ILkernel, info.target), the error is: "Parse errors in converting assembly program"

                                              Here is my kernel string:











                                              ILkernel[] =

                                              "ShaderType = 1\n"

                                              "TargetChip = w\n"

                                              ";SC Dep components\n"

                                              "NumClauseTemps = 4\n"

                                              "; -------- Disassembly --------------------\n"

                                              "00 TEX: ADDR(48) CNT(1) VALID_PIX\n "

                                              " 0 SAMPLE R0.x___, R0.xyxx, t0, s0 UNNORM(XYZW)\n"




                                              "01 ALU: ADDR(32) CNT(2) \n"

                                              " 1 x: MUL_e R0.x, R0.x, R0.x\n"




                                              " y: MOV R0.y, 0.0f\n"




                                              "02 EXP_DONE: PIX0, R0.xyyy\n"


                                              "; ----------------- PS Data ------------------------\n"

                                              "; Input Semantic Mappings\n"

                                              "IN R0 = position0 V0.xxxx DefaultVal={0,0,0,0}\n"

                                              "NumTexStages = 0\n"

                                              "TexCubeMaskBits = 0x00000000\n"

                                              "GprPoolSize = 0\n"

                                              "CodeLen = 400;Bytes\n"

                                              "PGM_END_CF = 0; words(64 bit)\n"

                                              "PGM_END_ALU = 0; words(64 bit)\n"

                                              "PGM_END_FETCH = 0; words(64 bit)\n"

                                              "MaxScratchRegsNeeded = 0\n"

                                              "; texResourceUsage[0] = 0x00000000\n"

                                              "; texResourceUsage[1] = 0x00000000\n"

                                              "; texResourceUsage[2] = 0x00000000\n"

                                              "; texResourceUsage[3] = 0x00000000\n"

                                              "; fetch4ResourceUsage[0] = 0x00000000\n"

                                              "; fetch4ResourceUsage[1] = 0x00000000\n"

                                              "; fetch4ResourceUsage[2] = 0x00000000\n"

                                              "; fetch4ResourceUsage[3] = 0x00000000\n"

                                              "; texSamplerUsage = 0x00000000\n"

                                              "; constBufUsage = 0x00000000\n"

                                              "ResourcesAffectAlphaOutput[0] = 0x00000000\n"

                                              "ResourcesAffectAlphaOutput[1] = 0x00000000\n"

                                              "ResourcesAffectAlphaOutput[2] = 0x00000000\n"

                                              "ResourcesAffectAlphaOutput[3] = 0x00000000\n"

                                              ";SQ_PGM_RESOURCES = 0x72000000\n"

                                              "SQ_PGM_RESOURCES:NUM_GPRS = 0\n"

                                              "SQ_PGM_RESOURCES:STACK_SIZE = 0\n"

                                              "SQ_PRM_RESOURCES:FETCH_CACHE_LINES = 2\n"

                                              "SQ_PRM_RESOURCESRIME_CACHE_ENABLE = 1\n"

                                              "; SPI_PS_IN_CONTROL_0 = 0x00000000\n"

                                              "SPI0:NUM_INTERP = 0\n"

                                              "SPI0OSITION_ENA = 0\n"

                                              "SPI0OSITION_CENTROID = 0\n"

                                              "SPI0OSITION_ADDR = 0\n"

                                              "SPI0ARAM_GEN = 0\n"

                                              "SPI0ARAM_GEN_ADDR = 0\n"

                                              "SPI0:BARYC_SAMPLE_CNTL = 0\n"

                                              "SPI0ERSP_GRADIENT_ENA = 0\n"

                                              "SPI0:LINEAR_GRADIENT_ENA = 0\n"

                                              "SPI0OSITION_SAMPLE = 0\n"

                                              "SPI0:BARYC_SAMPLE_ENA = 0\n"

                                              "; SPI_PS_IN_CONTROL_1 = 0x00000000\n"

                                              "SPI1:GEN_INDEX_PIX = 0\n"

                                              "SPI1:FIXED_PT_POSITION_ENA = 0\n"

                                              "SPI1:FIXED_PT_POSITION_ADDR = 0\n"

                                              "SPI1:FRONT_FACE_ENA = 0\n"

                                              "SPI1:FRONT_FACE_ADDR = 0\n"

                                              "SPI1:FRONT_FACE_CHAN = 0\n"

                                              "SPI1:FOG_ADDR = 0\n"

                                              "SPI1:GEN_INDEX_PIX_ADDR = 0\n"

                                              "; SPI_INPUT_Z\n"

                                              "SPIROVIDE_Z_TO_SPI = 0\n"

                                              "; CB_SHADER_MASK = 0x00000000\n"

                                              "CB_SHADER_CONTROL:bitmap = 00000000\n"

                                              "; DB_SHADER_CONTROL = 0x00000200\n"

                                              "DB:Z_EXPORT_ENABLE = 0\n"

                                              "DB:STENCIL_REF_EXPORT_ENABLE = 0\n"

                                              "DB:MASK_EXPORT_ENABLE = 0\n"

                                              "DB:ALPHA_TO_MASK_DISABLE = 0\n"

                                              "DB:Z_ORDER = 0\n"

                                              "DB:KILL_ENABLE = 0\n"

                                              "; SQ_PGM_EXPORTS_PS\n"

                                              "SQ_PGM_EXPORTS_PSS_EXPORT_MODE = 0x00000000 ; (0 color)\n"

                                              "; bHasFogMerge = 0x00000000\n"




                                        • CAL and ISA question

                                          As far as I understand calclAssembleObject does no optimizations. It directly translates to bitcode and execute on the machine.