cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Bdot
Adept III

OCL compile error

Hi, I'm running Win7/64, HD5770, Catalyst 13.1. When my program compiles I receive this output when targeting the GPU (runs fine on CPU):

Select device - OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.2 AMD-APP (1084.4)

Get device info - Device 1/1: Juniper (Advanced Micro Devices, Inc.),

device version: OpenCL 1.2 AMD-APP (1084.4), driver version: 1084.4 (VM)

Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing

Global memory:1073741824, Global memory cache: 0, local memory: 32768, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:960, compute units:10

Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE").LLVM ERROR: Cannot select: 0x8660700: i8 = setcc 0x8655250, 0x77c6140, 0x8659990 [ID=58] dbg:barrett.cl:5169:39

  0x8655250: i32 = AMDILISD::ADD 0x77c6140, 0x77b5530 [ID=55] dbg:barrett.cl:5169:39

    0x77c6140: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x8660500, 0x77c4220 [ORD=221913] [ID=48]

      0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]

      0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]

        0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]

        0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

        0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

          0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

          0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

            0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

          0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

            0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

              0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

            0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

        0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

          0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

            0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

              0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

            0x8661110: i32 = undef [ORD=221902] [ID=12]

          0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

      0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51

        0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]

    0x77b5530: i32 = and 0x8658a80, 0x77c4220 [ORD=221916] [ID=53] dbg:barrett.cl:5169:39

      0x8658a80: i32 = setcc 0x77c5a30, 0x77c5430, 0x62debd0 [ID=51] dbg:barrett.cl:5162:128

        0x77c5a30: i32 = AMDILISD::ADD 0x77bfad0, 0x8661e10 [ORD=221907] [ID=46] dbg:barrett.cl:5162:128

          0x77bfad0: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x865e1e0, 0x8656460 [ORD=221906] [ID=43]

            0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]

            0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

            0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

              0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

              0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

                0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

              0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

                0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

                  0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

                0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

          0x8661e10: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x77c4a20 [ORD=221905] [ID=40]

            0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

              0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

              0x8661110: i32 = undef [ORD=221902] [ID=12]

            0x77c4a20: i32 = TargetConstant<2> [ORD=221905] [ID=27]

        0x77c5430: i32 = setcc 0x8660500, 0x8664440, 0x8659990 [ID=47] dbg:barrett.cl:5162:128

          0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]

            0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]

            0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

            0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

              0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

              0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

                0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

              0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

                0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

                  0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

                0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

            0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

              0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

                0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                  0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

                0x8661110: i32 = undef [ORD=221902] [ID=12]

              0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

          0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

            0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

              0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

                0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

              0x8661110: i32 = undef [ORD=221902] [ID=12]

            0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

      0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51

        0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]

  0x77c6140: i32,ch = llvm.AMDIL.mulhi.u32 0x5e14070, 0x864b1b0, 0x8660500, 0x77c4220 [ORD=221913] [ID=48]

    0x864b1b0: i32 = TargetConstant<2674> [ORD=221906] [ID=17]

    0x8660500: i32,ch = llvm.AMDIL.mad24.u32 0x5e14070, 0x865e6e0, 0x865e1e0, 0x8656460, 0x8664440 [ORD=221904] [ID=44]

      0x865e6e0: i32 = TargetConstant<2623> [ORD=221904] [ID=13]

      0x865e1e0: i32 = Constant<4620> [ORD=221904] [ID=14]

      0x8656460: i32,ch = llvm.AMDIL.mul24.u32 0x5e14070, 0x77be2c0, 0x77b4c20, 0x8665850 [ORD=221900] [ID=39]

        0x77be2c0: i32 = TargetConstant<2666> [ORD=221900] [ID=9]

        0x77b4c20: i32,ch = CopyFromReg 0x5e14070, 0x77c2d00 [ORD=221900] [ID=32]

          0x77c2d00: i32 = Register %vreg1776 [ORD=221900] [ID=10]

        0x8665850: i32 = AMDILISD::VEXTRACT 0x77c07e0, 0x865d4d0 [ORD=221899] [ID=36]

          0x77c07e0: v4i32,ch = llvm.AMDIL.get.group.id 0x5e14070, 0x865d9d0 [ORD=221898] [ID=31]

            0x865d9d0: i32 = TargetConstant<2564> [ORD=221898] [ID=8]

          0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

      0x8664440: i32 = AMDILISD::VEXTRACT 0x8660e00, 0x865d4d0 [ORD=221903] [ID=41]

        0x8660e00: v2i32,ch = load 0x5e14070, 0x864ece0, 0x8661110<LD8[%arrayidx_v4397]> [ORD=221902] [ID=37]

          0x864ece0: i32,ch = CopyFromReg 0x5e14070, 0x865f1f0 [ORD=221901] [ID=33]

            0x865f1f0: i32 = Register %vreg1774 [ORD=221901] [ID=11]

          0x8661110: i32 = undef [ORD=221902] [ID=12]

        0x865d4d0: i32 = TargetConstant<1> [ORD=221903] [ID=26]

    0x77c4220: i32,ch = CopyFromReg 0x5e14070, 0x8664240 [ORD=221896] [ID=30] dbg:barrett.cl:5156:51

      0x8664240: i32 = Register %vreg1773 [ORD=221896] [ID=6]

What does that mean, and how do I avoid it?

Hmm, when I just tried it again with the packed-up zip, it fails in the CPU as well (run 'mfakto -d c' to let it choose the CPU), but with this error:

Select device - (CPU) - OpenCL Platform 1/1: Advanced Micro Devices, Inc., Version: OpenCL 1.2 AMD-APP (1084.4)

Get device info - Device 1/1: AMD Phenom(tm) II X4 955 Processor (AuthenticAMD),

device version: OpenCL 1.2 AMD-APP (1084.4), driver version: 1084.4 (sse2)

Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing

Global memory:4293038080, Global memory cache: 65536, local memory: 32768, workgroup size: 1024, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:3208, compute units:4

Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE").

        BUILD OUTPUT

".\barrett.cl", line 665: warning: statement is unreachable

    nn.d0  = n.d0 * qi;

    ^

".\barrett.cl", line 987: warning: statement is unreachable

    nn.d0  = n.d0 * qi;

    ^

".\barrett.cl", line 4975: warning: variable "exp96" was declared but never

          referenced

    __private int96_t  exp96, my_k_base, f_base;

                       ^

".\barrett.cl", line 4976: warning: variable "f" was declared but never

          referenced

    __private int96_v  a, u, f;

                             ^

"C:\Users\Bertram\AppData\Local\Temp\OCL6DE4.tmp.cl", line 536: warning:

          variable "as" was declared but never referenced

    int90_v a, as, b, r, m;

               ^

"C:\Users\Bertram\AppData\Local\Temp\OCL6DE4.tmp.cl", line 536: warning:

          variable "m" was declared but never referenced

    int90_v a, as, b, r, m;

                         ^

Internal Error:  ld failed

        END OF BUILD OUTPUT

Error -11: clBuildProgram

init_CL(3, -1) failed


I'm sure it worked on the CPU, but I may have changed one of the kernels a bit ... Just "ld failed" is not a lot to work with ...

Anyway, I'm attaching the zip.

0 Likes
1 Solution

I looked into this problem. It is a known bug, and the fix will be part of the 13.4 driver.

Regards,

Jan

View solution in original post

0 Likes
11 Replies
himanshu_gautam
Grandmaster

Hi Bdot,

I will look into the testcase, and let you know my findings.

0 Likes

Thanks for looking at it, Himanshu.

The code is still under development, so it may well be a programming error that I just can't find.

If you want to compile the CL code from another program (or KernelAnalyzer), just specify mfakto_Kernels.cl to be compiled with the options "-I. -DVECTOR_SIZE=2 -g -DMORE_CLASSES -DCL_GPU_SIEVE". (-g is not needed, I usually have -O3 instead, but that does not change things.)

I now also tried with older drivers 12.6, 12.8 and 12.10, and they fail as well, but report "as failed" instead of the long diagnostics and the crash. On Linux & catalyst 13.1 the output is exactly the same as on Windows w/ 13.1.

I also tried the 13.3 beta but this failed the same way, in addition to running my older code ~5% slower (the same source, but without the -DCL_GPU_SIEVE define).


0 Likes

I tried with Catalyst 12.10, win64 with Cayman (AMD 6950)

The run ended with "selftest FAILED!"

Is this what you call as "failure"?

I am not sure if this failure is compiler related.

I will attach the full output in another 10 minutes.

I will also test on 13.1 subsequently.

0 Likes

Hello Himanshu,

do you have any suggestion what I could change? Or some driver version I could test where that might be fixed? That thing is really holding me off ...

Is anything missing from my side?

Thanks for your help,

Bdot

0 Likes

Bdot, I will check next week. I am on a small vacation..Please bear with a delay. If possible, I will do this next week itself. Thanks for your patience.

0 Likes

I will test this today... fyi. Thanks for your patience.

0 Likes

Hi Himanshu,

thanks a lot for your testing. I had never seen it come that far. That leaves two possible explanations. Either my attempt to downgrade to 12.10 (from 13.1) failed, 12.10 works and 13.1 breaks,

or the compiler for Cayman is smarter. I tried on HD5770 and HD7850 and both failed the same way, so I assume it is rather a problem of Catalyst 13.1.

And you are right, the selftest failure represents the incomplete state of my programming, not the compiler problem.

I'll see that I can switch back to 12.10 again ... Is there some instructions how to do this? I just tried to "deinstall all AMD software" (or similar, what the Catalyst installer offers as the option that sounded most complete to me), and then installed the older ones. I started with 12.6 and upgraded them via 12.8 to 12.10 - all failing. What else is needed for a downgrade?

Thanks,

Bdot

0 Likes

Hi BDot,

I am sorry you have to go through this. I will test 13.1 today.

If you have 12.10, you can check the driver version against my output. (I think you print the version number -- which is very useful)

That will tell you if we both are running the same runtimes.

Also, I know that apart from the regular un-install, AMD ships a separate un-install utility which actually deletes some left-over by the main un-installer. This usually fixes the wrong run-time issue and prepares the system for installation of new driver.

http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx

hi bdot,

I was able to reproduce the LLVM errors on HD 7xxx devices under 13.3 and 13.1 driver. PFA the logs.

I am forwarding this to a relevant team.

Anyways i tried to debug the kernels a little and was able ro run the mfaketo.exe after disabling a single line of kernel code in berret.cl file: 5160

// Compute base k value

  my_k_base.d0 = mad24(NUM_CLASSES, mul24(bits_to_process, get_group_id(0)), k_base.d0);

  my_k_base.d1 = k_base.d1 + mul_hi(NUM_CLASSES, mul24(bits_to_process, get_group_id(0))) + (k_base.d0 > my_k_base.d0)? 1u : 0u;          /* k is limited to 2^64 -1 so there is no need for k.d2 */

I would request you to try to reduce the size of the testcase by disabling as much of your code as possible. A small testcase is very helpful in pinpointing the real issue.

I looked into this problem. It is a known bug, and the fix will be part of the 13.4 driver.

Regards,

Jan

0 Likes

Thank you very much, that is really good to know! My workaround that avoids the critical code, was rather expensive ... I found that vectorizing the statements and pulling one-time initializations into the loop helped avoid the crash, but similar changes show that this costs at least 5% performance.

0 Likes