9 Replies Latest reply on May 15, 2012 3:22 AM by quazee

    Enabling cl_amd_media_ops2 extension on Tahiti




      I am having trouble enabling the new cl_amd_media_ops2 extension with the latest SDK and drivers.

      I tried the Windows OpenCL 1.2 beta drivers, 12.4 official drivers, and "leaked" 12.5 - none of them worked.

      The hardware is Sapphire Radeon HD 7970, single card.


      #pragma OPENCL EXTENSION cl_amd_media_ops2 : enable

      causes a compiler error "line 1: error: can't enable all OpenCL extensions or unrecognized OpenCL extension"


      The reason I need cl_amd_media_ops2 to be enabled is the amd_sadw intrinsic (packed sum of 16-bit absolute differences) which appeared in the new SDK.

      (See http://www.khronos.org/registry/cl/extensions/amd/cl_amd_media_ops2.txt and

      http://developer.amd.com/sdks/amdappsdk/assets/AMD_APP_SDK_Release_Notes_Developer.pdf )


      Here is the output of CL_DEVICE_EXTENSIONS variable:

      cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing


      Note that the cl_amd_media_ops2 is missing.


      CL_DEVICE_NAME = Tahiti

      CL_DRIVER_VERSION = CAL 1.4.1720 (VM)  <---- this is the "openCL 1.2 beta"

      The amdocl.dll definitely contains "cl_amd_media_ops2" string, but does not report it in CL_DEVICE_EXTENSIONS for some reason.


      There are definitely some drivers which do support the feature - but I am unable to find them anywhere.

      See http://clbenchmark.com/sys-info.jsp?id=133582320595781685  - on that report cl_amd_media_ops2 is clearly present, and the driver version is different.


      Am I using the wrong drivers, or do I need to enable some registry setting/environment variable before I can use the feature?


      Thank you.

        • Re: Enabling cl_amd_media_ops2 extension on Tahiti

          pass in -Dcl_amd_media_ops2 during compile time to enable it.

          1 of 1 people found this helpful
            • Re: Enabling cl_amd_media_ops2 extension on Tahiti

              Thank you Micah,


              It works now, but doesn't produce expected ISA code (that is, v_sad_u16 instruction) - it looks like it is emulated in the compiler.

              Here is the kernel:


              #pragma OPENCL EXTENSION cl_amd_media_ops2 : enable

              __kernel void test(__global uint* pIn, __global uint* pOut)


              int z = get_global_id(0);

              pOut[z] = amd_sadw(pIn[z*2], pIn[z*2+1], 0);



              And it produces the following software-emulated sequence, thus, even slower than non-SADW version (boilerplate code removed):


              tbuffer_load_format_xy  v[1:2], v1, s[8:11], 0 offen format:[BUF_DATA_FORMAT_32_32,BUF_NUM_FORMAT_FLOAT] // 00000044: EBD91000 80020101

                s_waitcnt     vmcnt(0)                                    // 0000004C: BF8C1F70

                v_lshrrev_b32  v3, 16, v2                                 // 00000050: 2C060490

                v_and_b32     v2, 0x0000ffff, v2                          // 00000054: 360404FF 0000FFFF

                v_lshrrev_b32  v4, 16, v1                                 // 0000005C: 2C080290

                v_and_b32     v1, 0x0000ffff, v1                          // 00000060: 360202FF 0000FFFF

                v_max_u32     v5, v2, v1                                  // 00000068: 280A0302

                v_max_u32     v6, v3, v4                                  // 0000006C: 280C0903

                v_min_u32     v1, v2, v1                                  // 00000070: 26020302

                v_min_u32     v2, v3, v4                                  // 00000074: 26040903

                v_sub_i32     v1, vcc, v5, v1                             // 00000078: 4C020305

                v_sub_i32     v2, vcc, v6, v2                             // 0000007C: 4C040506

                v_add_i32     v1, vcc, v1, v2                             // 00000080: 4A020501

                tbuffer_store_format_x  v1, v0, s[16:19], 0 offen format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_FLOAT] // 00000084: EBA41000 80040100


              It looks like newer drivers are needed - since the drivers don't report proper support for this extension

              (but these ones do: http://clbenchmark.com/sys-info.jsp?id=133582320595781685  - I want to know where to get them )

            • Re: Enabling cl_amd_media_ops2 extension on Tahiti

              Looks like I found a temporary workaround for now - something like this at the beginning of the kernel:

              #ifdef SADW_KLUDGE

                 #define amd_sadw amd_sad

                 #pragma OPENCL EXTENSION cl_amd_media_ops : enable



              Then I compile the kernel with "-fbin-exe -fno-bin-amdil -fno-bin-llvmir -fno-bin-source -DSADW_KLUDGE" options (extra options just to minimize the chance of corrupting the binary image when doing brute-force replace).


              Then, I just replace "01 00 b4 d2" with "01 00 b8 d2" in the image before running it.

              The opcode for amd_sadw is D2B80001 (v_sad_u16).
              It's a long opcode (arguments go in the next 4 bytes), so it's easy to replace by brute force.


              I tested the amd_sadw functionality for correctness on a bunch of random input - looks like it's doing exactly what I want.

              Maybe someday I will do proper ELF parsing / instruction boundary detection to make it bullet proof - currently, the workaround will break if there is a similar constant in the code, etc.


              Looking forward to the fixed Catalyst version - this stuff gives a whole new meaning to "kernel hacking" .

                • Re: Enabling cl_amd_media_ops2 extension on Tahiti

                  Great stuff.

                  Then, I just replace "01 00 b4 d2" with "01 00 b8 d2" in the image before running it.

                  The opcode for amd_sadw is D2B80001 (v_sad_u16).
                  It's a long opcode (arguments go in the next 4 bytes), so it's easy to replace by brute force.


                  The actual opcode is 0xD2B80000 & 0xFFFE0000. The lower byte with 0x01 is the destination register and can change.

                  The second byte has bits used mostly for float arguments so looking for 0xD2B80000&0xFFFFFF00 should be ok for integer instructions.