3 Replies Latest reply on Dec 11, 2008 5:58 PM by jfkong

    exportspeed sample program from CAL failed verification

    jfkong

      hi,

      I use Linux X86-64, CAL 1.2.1beta-1 and latest driver on one 4870.

      /amdcal/samples/runtime/exportspeed -r 1 -w 256 -h 256 -e  passed.

      However exportspeed -r 1 -w 16 -h 16 -e failed.

      Could someone please verify this?

      I suspect there is something wrong with the global buffer as I met some other problem too. 

      Below the simple kernel just copies one element from one input buffer(the global buffer ) to another output buffer.  However I failed to get correct results when using options  "-w 4 -h 16 -e".  Say third element (float4) of the input is 1.0, 0.0, 1.0, 1.0.  The result I got could be 0.0, 0.0, 0.0, 0.0 or other strange numbers such as 0.089. I also sometimes got the correct result. But the result keeps changing everytime I execute the program.

      The input buffer is 2D float4 and output is 2D float4 (w/4 and h).  Execution domain is (0, 0, w/4, h). However  "-w 16 -h 16 -e" always gets me the correct result

      const CALchar ILKernel[] =
      "il_ps_2_0\n"
      "dcl_cb cb0[1]\n"
      "dcl_output_generic o0\n"
      "dcl_literal l0, 0x00000002, 0x40490FDB, 0x40490FDB, 0x40490FDB\n"
      "mov     r0, g[l0.x]\n"
      "mov o0, r0\n"
      "end\n";

       

      Another example is memimport_matmul

      if you try -r 1 -w 16 -h 16 -e, the verification is good.

      However if you try  -r 1 -w 16 -h 8 -e, the verification fails.

        • exportspeed sample program from CAL failed verification
          MicahVillmow
          JFKong, please see the CAL release notes. Inside it states that global buffer has a minimum size of 64x64 and needs to be a multiple of 64 in width.
          • exportspeed sample program from CAL failed verification
            jfkong

            Thanks a lot for the info. But you might want to remove this constraint as sometimes for debug/test purpose, we'd like to try few elements at the beginning. It really took me a while to figure out the weird behavior.

             

            Also I still can't find the exact info about the minimum size of 64X64 in the CAL release notes. And I think this constraint could be noted precisely in the IL document (along with global buffer description).

            In amd-cal-readme-win

            "Local Memory Global Buffers
            ---------------------------
            Local memory global buffers must be a multiple of 64 elements wide. If they are not a multiple of 64 wide, the allocation will fail, and the code
            returns an error."

            In amd-cal-readme-linux32

            "NOTE ON LOCAL MEMORY GLOBAL BUFFERS:   
               
                Local memory global buffers with sizes < 64 produces incorrect
                results when the memory is directly mapped using calResMap. If
                the data is copied via calMemCopy to a remote buffer before
                calling calResMap on the remote buffer, the data is correct.
                This does not affect directly writing to remote memory."

            In amd-cal-readme-linux64

            "NOTE ON LOCAL MEMORY GLOBAL BUFFERS:  
               
                Local memory global buffers with sizes < 64 produces incorrect
                results when the memory is directly mapped using calResMap. If
                the data is copied via calMemCopy to a remote buffer before
                calling calResMap on the remote buffer, the data is correct.
                This does not affect directly writing to remote memory."

              • exportspeed sample program from CAL failed verification
                jfkong

                hi, Micah Villmow

                 

                I just got SDK1.3, new 8.12 driver and according to the documentation (page 3-36)

                "Note: Global (Linear) buffers are always padded to a 64-element boundary; however, the memexport instruction is not constrained by this,
                and the program can write into the pad area. During mapping, when
                copying from local to remote storage, data written to the pad area
                is not copied (it is lost).


                The hardware output paths are different when a buffer is attached
                as an export buffer rather than an output buffer.


                Ensure that the global buffer has a width that is a multiple of 64
                elements.


                When entering a width that is not multiple of 64 and using the global
                buffer, calResAllocLocal2D returns a warning. Users also can
                query the error message for this warning"

                 

                Ok, then I tried exportspeed again.

                ./exportspeed -r 1 -w 16 -h 1 -e: error message (something about the width should be multiple of 64).

                ./exportspeed -r 1 -w 64 -h 1 -e: passed

                So far so good, however

                if I do two different runs in a row:

                ../inputspeed/inputspeed  (another sample program in CAL)

                ./exportspeed -r 1 -w 64 -h 1 -e: FAILED !!!

                This is no good and I am not sure if it is a bug or just that 64X64 is strictly required? (probably not 64X64 since 64 2 worked)

                Thanks