cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

peastman
Journeyman III

Error using OpenCL 2.0

I have an OpenCL program that was written to work with OpenCL 1.2.  I'm now trying to add some OpenCL 2.0 features to it.  But if I add "-cl-std=CL2.0" to the compilation options, then the program fails with the following error:

Symbol not found: % LLVM ERROR: Caught exception from libHSAIL

I'm not yet actually using any OpenCL 2.0 features.  The only change I made was adding that one option to clBuildProgram.  Without that option it works correctly.  Any idea what's going on?  This is with APP SDK 3.0 and an R9 Fury on Ubuntu 12.04.

0 Likes
25 Replies
pinform
Staff

Hi Peter,

Welcome!

I have white-listed you, so you should be able to directly post in the relevant forum. As this post seems to be relevant to the OpenCL forum, I am moving it there -- the experts there should help.

--Prasad

0 Likes
dipak
Big Boss

Hi Peter,

If the build flag "-cl-std=CL2.0" is used, the program is built as OCL 2.0 whether any OpenCL 2.0 feature is used or not. In order to build the kernel as OpenCL 2.0, you need to have OpenCL 2.0 compatible driver. Did you install any such driver? If not, please download the latest driver from here Download Drivers and then try. If you still observe the error, please share the kernel code and your setup details.

Regards,

0 Likes

Thanks!  My driver is pretty recent.  The installer is amd-driver-installer-15.20.1046-x86.x86_64.run, with modification date July 7.  Also, if I query CL_DEVICE_OPENCL_C_VERSION, it reports OpenCL 2.0.  But I'll try upgrading to the very most recent driver and see if that helps.

0 Likes

After installing the latest driver (AMD-Catalyst-15.9-Linux-installer-15.201.1151-x86.x86_64.run), I still get the same error.

I also ran the full test suite for my application, and found I get a variety of different failures in different tests.  Some fail with this error.  Others appear to run, but then produce incorrect results.  If I remove the compiler option, all of them pass.

0 Likes

In that case, could you please provide the reproducible test-cases that generate the above errors?

Regards,

0 Likes

The source code for the project (OpenMM) is at https://github.com/pandegroup/openmm​.  Instructions for compiling it are at http://docs.openmm.org/6.3.0/userguide/library.html#compiling-openmm-from-source-code.  But first you'll need to modify it to specify OpenCL 2.0.  You can do that by modifying this line to add " -cl-std=CL2.0" to defaultOptimizationOptions:

https://github.com/pandegroup/openmm/blob/master/platforms/opencl/src/OpenCLContext.cpp#L180

Once you compile it, you can run the individual test cases from the command line.  The following tests fail with the error mentioned above:

TestOpenCLAndersenThermostat

TestOpenCLCheckpoints

TestOpenCLCustomIntegrator

TestOpenCLEwald

In addition, the following tests fail due to calculations producing incorrect results:

TestOpenCLCustomNonbondedForce

TestOpenCLGBSAOBCForce

TestOpenCLNonbondedForce

0 Likes

By the way, you can turn off a lot of CMake options when compiling it:

OPENMM_BUILD_AMOEBA_CUDA_LIB

OPENMM_BUILD_CPU_LIB

OPENMM_BUILD_CUDA_COMPILER_PLUGIN

OPENMM_BUILD_CUDA_LIB

OPENMM_BUILD_C_AND_FORTRAN_WRAPPERS

OPENMM_BUILD_DRUDE_CUDA_LIB

OPENMM_BUILD_EXAMPLES

OPENMM_BUILD_PME_PLUGIN

OPENMM_BUILD_PYTHON_WRAPPERS

OPENMM_BUILD_RPMD_CUDA_LIB

OPENMM_BUILD_STATIC_LIB

OPENMM_GENERATE_API_DOCS

That will both make it compile a lot faster, and also eliminate most of the dependencies (CUDA, Python, Doxygen, Fortran, SWIG, FFTW).

0 Likes

Thanks Peter. We'll check and get back to you.

0 Likes

My apologies for this delayed reply.

I'm able to reproduce the error. However, as the project is large one and I'm not familiar with it, I'm facing difficulty to find out the probable problematic area of the code that generating the error. For debugging purpose, a small test case would be appreciated. Could you please help me in this regard?

Regards,

0 Likes

Thanks.  I'll see if I can simplify it down to smaller test cases for each of the errors.

0 Likes

The following kernel triggers the LLVM error.  Just pass this to clBuildProgram(), with the options specified in the first line.  Most of the defines and typedefs are boilerplate that gets appended to the start of all kernels, not anything specific to this.

// Compilation Options: -cl-fast-relaxed-math -cl-std=CL2.0

#define APPLY_PERIODIC_TO_DELTA(delta) delta.xyz -= floor(delta.xyz*invPeriodicBoxSize.xyz+0.5f)*periodicBoxSize.xyz;

#define APPLY_PERIODIC_TO_POS(pos) pos.xyz -= floor(pos.xyz*invPeriodicBoxSize.xyz)*periodicBoxSize.xyz;

#define APPLY_PERIODIC_TO_POS_WITH_CENTER(pos, center) {pos.x -= floor((pos.x-center.x)*invPeriodicBoxSize.x+0.5f)*periodicBoxSize.x; \

pos.y -= floor((pos.y-center.y)*invPeriodicBoxSize.y+0.5f)*periodicBoxSize.y; \

pos.z -= floor((pos.z-center.z)*invPeriodicBoxSize.z+0.5f)*periodicBoxSize.z;}

#define EXP native_exp

#define LOG native_log

#define RECIP native_recip

#define RSQRT native_rsqrt

#define SQRT native_sqrt

#define SUPPORTS_64_BIT_ATOMICS

#define SUPPORTS_DOUBLE_PRECISION

#define SYNC_WARPS

#define WORK_GROUP_SIZE 64

#define convert_mixed4 convert_float4

#define convert_real4 convert_float4

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

typedef float real;

typedef float2 real2;

typedef float3 real3;

typedef float4 real4;

typedef float mixed;

typedef float2 mixed2;

typedef float3 mixed3;

typedef float4 mixed4;

#define NUM_ATOMS 8

/**

* Apply the Andersen thermostat to adjust particle velocities.

*/

__kernel void applyAndersenThermostat(float collisionFrequency, float kT, __global mixed4* velm, __global const mixed2* restrict stepSize, __global const float4* restrict random,

        unsigned int randomIndex, __global const int* restrict atomGroups) {

    float collisionProbability = (float) (1.0f-exp(-collisionFrequency*stepSize[0].y));

    float randomRange = (float) erf(collisionProbability/exp(2.0f));

    for (int index = get_global_id(0); index < NUM_ATOMS; index += get_global_size(0)) {

        mixed4 velocity = velm[index];

        float4 selectRand = random[randomIndex+atomGroups[index]];

        float4 velRand = random[randomIndex+index];

        real scale = (selectRand.w > -randomRange && selectRand.w < randomRange ? 0 : 1);

        real add = (1-scale)*sqrt(kT*velocity.w);

        velocity.x = scale*velocity.x + add*velRand.x;

        velocity.y = scale*velocity.y + add*velRand.y;

        velocity.z = scale*velocity.z + add*velRand.z;

        velm[index] = velocity;

    }

}

0 Likes

Thanks for sharing the code. It helped a lot .

It seems that the error is due to built-in function "erf()". The same error can also be reproducible using this simple kernel:

__kernel void test_kernel( __global float *a )

{

    int gid = get_global_id(0);

     a[gid]= erf(a[gid]);

}

I'll report this to compiler team. Thanks once again.

Regards,

0 Likes

Glad you were able to find it that quickly!

I've been trying to narrow down one of the cases that produces incorrect results.  I haven't managed to create a simple test case, but I've at least managed to isolate what kernel the error is happening in (which, unfortunately, is a very large and complicated kernel).  The test case I'm looking at is one of the ones in TestOpenCLNonbondedForce.  The specific test case that's failing is testLargeSystem(), called here:

openmm/TestNonbondedForce.h at master · pandegroup/openmm · GitHub

So you can comment out all the calls to testX() functions before that.

Here is the kernel that's failing:

openmm/findInteractingBlocks.cl at master · pandegroup/openmm · GitHub

(Note that file contains two different versions of that kernel, one used on AMD GPUs and one used on all others.)  Here is the point where that kernel gets invoked:

openmm/OpenCLNonbondedUtilities.cpp at master · pandegroup/openmm · GitHub

To clearly see that something's going wrong, add these lines immediately after the call to context.executeKernel():

unsigned int count;

interactionCount->download(&count);

std::cout<<count<<std::endl;

That downloads and prints out the value of interactionCount that was just computed by the kernel.  When the test is working correctly, it prints out 65.  When I tell it to use OpenCL 2.0, though, it prints out completely ridiculous values, often in the tens of millions.

0 Likes

Hi Peter,

Thanks for your efforts.

When the test is working correctly, it prints out 65.  When I tell it to  use OpenCL 2.0, though, it prints out completely ridiculous values,  often in the tens of millions.

I made the changes and ran the kernel for both CL 1.2 and 2.0. Please find the outputs attached herewith.

I'm little bit confused with the result got from CL 1.2 . I was expecting "count" value  as 65, but got something else. I'm not sure whether it has anything to do with the device, because I ran it on a Hawaii card (R9 290X). Sorry, I can check it on a Fury card as I don't have one. Did you check the kernel on any other devices?

Regards,

0 Likes

I'm not sure why the number is different for you, but the test passes, so I think it's OK.  Possibly it's because of the different GPU.  Or maybe the random number generator used to initialize the test case is behaving differently on your computer.  The important thing is the difference between the two cases: 61 and the test passes with CL 1.2, 133169152 and the test fails with CL 2.0.

0 Likes

Okay. In that case, this kernel can be used to report the issue. However, the problem is that I can't just use this whole project to report the issue to the compiler/driver team. I need a more simpler test-case.  At least, only the relevant sections which clearly manifests the issue.  Is it possible to isolate the kernel and the corresponding host-code? If so, it would be very helpful to us.

Regards,

0 Likes

Sorry for the delay.  There's a lot of code involved in setting up this kernel, so instead of reproducing all that, I just dumped the contents of all the input arrays to a binary file.  That allows this test to be more self contained.  I also simpified the kernel code a little bit, but only a little.  test.cpp contains two versions of the call to program.build().  Depending on which one I comment out, the output value is either 65 or something huge.

https://www.dropbox.com/s/etm8o4vgrr3gvep/amd_cl2_test.zip?dl=0

0 Likes

Thanks Peter for providing the reduced test-case. It's really helpful to us. I'll check and get back to you.

Regards,

.

0 Likes

Hi Peter,

Just to inform you that I'm able to reproduce the error using that test-case. I'll file a bug report against it.

Thanks once again for providing the test-case.

Regards

0 Likes

Hi Peter,

It seems that both, the kernel code causing the LLVM error and the latest test-case manifesting the wrong value, are working fine using the latest Crimson 15.11 driver. Could you please check and share your observation?

Regards,

0 Likes

I just downloaded the most recent driver (amd-driver-installer-15.30.1025-x86.x86_64.run).  Now when I run any OpenCL program (with or without OpenCL 2.0 specified), I get a segfault.  Here's the relevant part of the stack trace from gdb:

#0  __strlen_sse2_pminub ()

    at ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:39

#1  0x00007fffef2d4df1 in ?? ()

   from /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so

#2  0x00007fffef650c70 in ?? ()

   from /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so

#3  0x00007fffef6513ce in ?? ()

   from /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so

#4  0x00007fffef65820e in ?? ()

   from /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so

#5  0x00007fffef65a97e in ?? ()

   from /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so

#6  0x00007ffff3513589 in aclCompile () from /usr/lib/libamdocl64.so

#7  0x00007ffff2bd4198 in ?? () from /usr/lib/libamdocl64.so

#8  0x00007ffff2ba2116 in ?? () from /usr/lib/libamdocl64.so

#9  0x00007ffff2bb18ec in ?? () from /usr/lib/libamdocl64.so

#10 0x00007ffff2b929d0 in clBuildProgram () from /usr/lib/libamdocl64.so

#11 0x00007ffff7a31d12 in cl::Program::build (this=<optimized out>,

    devices=..., options=0x7ffff06b7420 "--march=gpu-64 -D__AMDIL_64__ -D__",

    notifyFptr=0x22, data=0x20)

    at /home/peastman/workspace/openmm/platforms/opencl/src/cl.hpp:4842

#12 0x00007ffff7a2d1cf in OpenMM::OpenCLContext::createProgram (

    this=<optimized out>, source=..., defines=...,

    optimizationFlags=<optimized out>)

    at /home/peastman/workspace/openmm/platforms/opencl/src/OpenCLContext.cpp:590

So not exactly a step in the right direction.   I tried installing the driver and rebooting a second time, just in case that made a difference, but it didn't.

0 Likes

Yeah, I faced similar problem. It happened only when APP SDK was also there. After some experiments, I found following workarounds:

1. Set the LD_LIBRARY_PATH so that libamdocl12cl64.so is accessed from /usr/lib instead of /opt/AMDAPPSDK-3.0/lib/x86_64/

OR

2. Just remove or rename the libamdocl12cl64.so at /opt/AMDAPPSDK-3.0/lib/x86_64/ so that it access the library from /usr/lib

OR

3. run the clinfo or the program with root privilege e.g. "sudo clinfo"

Hope one of these above workarounds will also work for you.

Regards,

0 Likes

Thanks!  Removing libamdocl12cl64.so fixed it.  I can now run the tests, and confirm that both of the problems reported above are now fixed.   I reran the full test suite, and I still have a couple of tests that fail with CL 2.0.  I'll investigate and report back when I have more information about what's happening with those.

0 Likes

Hi Peter,

Thanks for the confirmation.

As the issues reported here are now resolved, could this thread be closed?

If you find any new one, I would suggest you to create a new thread against that. I guess, it would be easier to track in future. What do you think?

Regards,

0 Likes