3 Replies Latest reply on Sep 22, 2011 7:24 PM by corry

    Few questions about opencl & cal

    hazeman

      Hi, I've got few question about opencl & cal for ATI team. ( And if possible please don't give PR type of answers - examples: "you can't compare opencl to brook cause opencl doesn't use texture units" or "opencl is only for 5xxx family" when compiler lists only 4xxx target ).

      1. Do you plan to release new extesions used by opencl ( and when ) ? Also if possible could you write short description what they do ( some are obvious ) ?

      Here is the list:

      calExtGetProc: extid=8007 name=calConfig
      calExtGetProc: extid=8005 name=calCtxCreatePrivateCounter
      calExtGetProc: extid=8005 name=calCtxConfigPrivateCounter
      calExtGetProc: extid=8005 name=calCtxGetPrivateCounter
      calExtGetProc: extid=8008 name=calResAllocView
      calExtGetProc: extid=8008 name=calResQueryInfo
      calExtGetProc: extid=8008 name=calResMemCopy
      calExtGetProc: extid=8009 name=calCtxWaitForEvents ( is it blocking ? )
      calExtGetProc: extid=800B name=calMemCopyRaw

      2.  Are the devs going to implement LDS optimization for 4xxx family. Specifically I'm thinking about detecting if kernel writes to LDS match pattern "LDS[const1*p + const2]=value" ( where const2<const1 ). This would allow to use native LDS. And if memory access doesn't follow this pattern use global memory ( as it's done now ).

      Probably most of the kernels will use this access pattern anyway and it would give huge speed advantage ( and probably some could be converted by programmers if they knew about this optimization ).

      3. Are the devs planning to implement use of texture units for memory access ( 4xxx family ). Again the problem is detecting by compiler if memory reads follow the pattern "value=some_const_pointer_parameter[width*y + x]" ( where x<width and width is some value which could be computed by kernel ( or const or parameter ) ). As it's const it cannot be written and memory overlapping with other parameters could be detected at run time ( then we use current compiler code ).

      This optimization is quite important for writing efficient kernels ( like matrix mul ).

      4. CAL & 4xxx question. Access from/to memory by g[] variable ( global buffer ) generates code with UNCACHED flag. Is it possible to change it to CACHED ?

      Example from some code:

      07 TEX: ADDR(64) CNT(1)
                    9  RD_SCATTER R0, DWORD_PTR[0+R0.x], ELEM_SIZE(3) UNCACHED

      Documentation to 7xx ISA suggests CACHED flag should be available.

      And one more thingy . If you can't answer some or all of this questions please write so .

      Also I can add that for me opencl is unusable without points 2 & 3 ( i'm forced to use CAL or switch to other brand which could be less hassle ).

      Hazeman

       

       

        • Few questions about opencl & cal
          michael.chu

          Hi hazeman,

          We're primarily focusing on exposing new functionality and extensions through OpenCL. Therefore, for now, any new CAL extensions you see used by OpenCL will not immediately be documented and released for general use. We will be evaluate, on a case-by-case basis, if some of these OpenCL requested features in CAL need to be exposed for general use, but it is not automatic.

          Performance optimization efforts are going primarily towards the Evergreen family and beyond so the 4xxx won't get all of the optimizations that the Evergreen family will. Some of the optimizations are simply easier or will yield more results on the Evergreen architecture since that architecture was designed with OpenCL 1.0 in mind.

            • Few questions about opencl & cal
              rahulgarg

              Hi Michael.

              I have two feature requests related to CAL IL:

              a) Can we get full local memory support so that you can read and write to arbitrary addresses to local memory for RV8xx? This exists for OpenCL already.

              b) Can we get support for 32-bit scatters to the global buffer? Currently in the CAL SDK, all writes are 128-bit aligned which is a major pain point.

              These two will be great for me and my compiler project.

                • Few questions about opencl & cal
                  corry

                  I really do hate reviving old topics, but....

                  Hazeman, I'm curious as to where you got that list?  Was it included in previous releases of the SDK? 

                  I came across this whole calExt thing when I saw in the CAL 2.0 programming guide it said to call calCtxGetErrorString if calCtxRunProgram fails with CAL_RESULT_ERROR, which I wanted to try since calGetErrorString returns "Operational Error" Thanks CAL, I could have figured that much out on my own! ;-)  I'm curious if this was a calExt, especially considering I see some other calCtx funtions in what you listed.

                  My guess is either getting those names was something like using pedump or similar, something really difficult, or something incredibly simple and obvious no one ever posted on it.  Either way, with CAL "deprecated" I'm left trying to find the answers everyone else found years ago.  Welcome to 2009! :)

                  Thanks

                  Corry