11 Replies Latest reply on Apr 8, 2013 2:51 PM by Bdot

    OCL compile error

    Bdot

      Hi, I'm running Win7/64, HD5770, Catalyst 13.1. When my program compiles I receive this output when targeting the GPU (runs fine on CPU):

       

      Select device - OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.2 AMD-APP (1084.4)

      Get device info - Device 1/1: Juniper (Advanced Micro Devices, Inc.),

      device version: OpenCL 1.2 AMD-APP (1084.4), driver version: 1084.4 (VM)

      Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing

      Global memory:1073741824, Global memory cache: 0, local memory: 32768, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:960, compute units:10

      Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE").LLVM ERROR: Cannot select: 0x8660700: i8 = setcc 0x8655250, 0x77c6140, 0x8659990 [ID=58] dbg:barrett.cl:5169:39

        0x8655250: i32 = AMDILISD::ADD 0x77c6140, 0x77b5530 [ID=55] dbg:barrett.cl:5169:39

          0x77c6140: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x8660500, 0x77c4220 [ORD=221913] [ID=48]

            0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]

            0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]

              0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]

              0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

              0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

                0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

                0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

                  0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

                0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

                  0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

                    0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

                  0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

              0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

                0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

                  0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                    0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

                  0x8661110: i32 = undef [ORD=221902] [ID=12]

                0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

            0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51

              0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]

          0x77b5530: i32 = and 0x8658a80, 0x77c4220 [ORD=221916] [ID=53] dbg:barrett.cl:5169:39

            0x8658a80: i32 = setcc 0x77c5a30, 0x77c5430, 0x62debd0 [ID=51] dbg:barrett.cl:5162:128

              0x77c5a30: i32 = AMDILISD::ADD 0x77bfad0, 0x8661e10 [ORD=221907] [ID=46] dbg:barrett.cl:5162:128

                0x77bfad0: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x865e1e0, 0x8656460 [ORD=221906] [ID=43]

                  0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]

                  0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

                  0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

                    0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

                    0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

                      0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

                    0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

                      0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

                        0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

                      0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

                0x8661e10: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x77c4a20 [ORD=221905] [ID=40]

                  0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

                    0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                      0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

                    0x8661110: i32 = undef [ORD=221902] [ID=12]

                  0x77c4a20: i32 = TargetConstant<2> [ORD=221905] [ID=27]

              0x77c5430: i32 = setcc 0x8660500, 0x8664440, 0x8659990 [ID=47] dbg:barrett.cl:5162:128

                0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]

                  0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]

                  0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

                  0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

                    0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

                    0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

                      0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

                    0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

                      0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

                        0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

                      0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

                  0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

                    0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

                      0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                        0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

                      0x8661110: i32 = undef [ORD=221902] [ID=12]

                    0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

                0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

                  0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

                    0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                      0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

                    0x8661110: i32 = undef [ORD=221902] [ID=12]

                  0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

            0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51

              0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]

        0x77c6140: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x8660500, 0x77c4220 [ORD=221913] [ID=48]

          0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]

          0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]

            0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]

            0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

            0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

              0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

              0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

                0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

              0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

                0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

                  0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

                0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

            0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

              0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

                0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                  0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

                0x8661110: i32 = undef [ORD=221902] [ID=12]

              0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

          0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51

            0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]

      What does that mean, and how do I avoid it?

       

      Hmm, when I just tried it again with the packed-up zip, it fails in the CPU as well (run 'mfakto -d c' to let it choose the CPU), but with this error:

      Select device - (CPU) - OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.2 AMD-APP (1084.4)

      Get device info - Device 1/1: AMD Phenom(tm) II X4 955 Processor (AuthenticAMD),

      device version: OpenCL 1.2 AMD-APP (1084.4), driver version: 1084.4 (sse2)

      Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing

      Global memory:4293038080, Global memory cache: 65536, local memory: 32768, workgroup size: 1024, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:3208, compute units:4

      Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE").

              BUILD OUTPUT

      ".\barrett.cl", line 665: warning: statement is unreachable

          nn.d0  = n.d0 * qi;

          ^

       

      ".\barrett.cl", line 987: warning: statement is unreachable

          nn.d0  = n.d0 * qi;

          ^

       

      ".\barrett.cl", line 4975: warning: variable "exp96" was declared but never

                referenced

          __private int96_t  exp96, my_k_base, f_base;

                             ^

       

      ".\barrett.cl", line 4976: warning: variable "f" was declared but never

                referenced

          __private int96_v  a, u, f;

                                   ^

       

      "C:\Users\Bertram\AppData\Local\Temp\OCL6DE4.tmp.cl", line 536: warning:

                variable "as" was declared but never referenced

          int90_v a, as, b, r, m;

                     ^

       

      "C:\Users\Bertram\AppData\Local\Temp\OCL6DE4.tmp.cl", line 536: warning:

                variable "m" was declared but never referenced

          int90_v a, as, b, r, m;

                               ^

       

      Internal Error:  ld failed

       

              END OF BUILD OUTPUT

      Error -11: clBuildProgram

      init_CL(3, -1) failed


      I'm sure it worked on the CPU, but I may have changed one of the kernels a bit ... Just "ld failed" is not a lot to work with ...

      Anyway, I'm attaching the zip.

        • Re: OCL compile error
          himanshu.gautam

          Hi Bdot,

          I will look into the testcase, and let you know my findings.

            • Re: OCL compile error
              Bdot

              Thanks for looking at it, Himanshu.

               

              The code is still under development, so it may well be a programming error that I just can't find.

               

              If you want to compile the CL code from another program (or KernelAnalyzer), just specify mfakto_Kernels.cl to be compiled with the options "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE". (-g is not needed, I usually have -O3 instead, but that does not change things.)

               

              I now also tried with older drivers 12.6, 12.8 and 12.10, and they fail as well, but report "as failed" instead of the long diagnostics and the crash. On Linux & catalyst 13.1 the output is exactly the same as on Windows w/ 13.1.

              I also tried the 13.3 beta but this failed the same way, in addition to running my older code ~5% slower (the same source, but without the -DCL_GPU_SIEVE define).

               

               


              • Re: OCL compile error
                Bdot

                Hello Himanshu,

                 

                do you have any suggestion what I could change? Or some driver version I could test where that might be fixed? That thing is really holding me off ...

                 

                Is anything missing from my side?

                 

                Thanks for your help,

                Bdot

                  • Re: OCL compile error
                    himanshu.gautam

                    Bdot, I will check next week. I am on a small vacation..Please bear with a delay. If possible, I will do this next week itself. Thanks for your patience.

                    • Re: OCL compile error
                      Bdot

                      Hi Himanshu,

                       

                      thanks a lot for your testing. I had never seen it come that far. That leaves two possible explanations. Either my attempt to downgrade to 12.10 (from 13.1) failed, 12.10 works and 13.1 breaks,

                      or the compiler for Cayman is smarter. I tried on HD5770 and HD7850 and both failed the same way, so I assume it is rather a problem of Catalyst 13.1.

                       

                      And you are right, the selftest failure represents the incomplete state of my programming, not the compiler problem.

                       

                      I'll see that I can switch back to 12.10 again ... Is there some instructions how to do this? I just tried to "deinstall all AMD software" (or similar, what the Catalyst installer offers as the option that sounded most complete to me), and then installed the older ones. I started with 12.6 and upgraded them via 12.8 to 12.10 - all failing. What else is needed for a downgrade?

                       

                      Thanks,

                      Bdot

                        • Re: OCL compile error
                          himanshu.gautam

                          Hi BDot,

                           

                          I am sorry you have to go through this. I will test 13.1 today.

                          If you have 12.10, you can check the driver version against my output. (I think you print the version number -- which is very useful)

                          That will tell you if we both are running the same runtimes.

                           

                          Also, I know that apart from the regular un-install, AMD ships a separate un-install utility which actually deletes some left-over by the main un-installer. This usually fixes the wrong run-time issue and prepares the system for installation of new driver.

                          http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx

                          1 of 1 people found this helpful
                          • Re: OCL compile error
                            himanshu.gautam

                            hi bdot,

                            I was able to reproduce the LLVM errors on HD 7xxx devices under 13.3 and 13.1 driver. PFA the logs.

                            I am forwarding this to a relevant team.

                            Anyways i tried to debug the kernels a little and was able ro run the mfaketo.exe after disabling a single line of kernel code in berret.cl file: 5160

                            // Compute base k value

                              my_k_base.d0 = mad24(NUM_CLASSES, mul24(bits_to_process, get_group_id(0)), k_base.d0);

                              my_k_base.d1 = k_base.d1 + mul_hi(NUM_CLASSES, mul24(bits_to_process, get_group_id(0))) + (k_base.d0 > my_k_base.d0)? 1u : 0u;          /* k is limited to 2^64 -1 so there is no need for k.d2 */

                             

                            I would request you to try to reduce the size of the testcase by disabling as much of your code as possible. A small testcase is very helpful in pinpointing the real issue.

                            1 of 1 people found this helpful