1 Reply Latest reply on Jan 18, 2016 1:50 AM by vchuravy

    Setting up AMD APP SDK 3.0 for automated testing on travis-ci.org (and seqfault/stalling in clWaitForEvents)

    vchuravy

      Dear All,

       

      I am one of the developers of the Julia bindings for OpenCL (JuliaGPU/OpenCL.jl · GitHub ) and we are using travis-ci for automated testing, I am currently in the progress of switching from the old travis infrastructure to their new docker based infrastructure (in which I can't use sudo for installing software anymore). Previously we installed the fglrx packages for Ubuntu (which also provides a CPU implementation for OpenCL) and everything worked well. Now since we can't install fglrx anymore and we would like to use an up to date OpenCL implementation we switched over to AMD APP SDK.

       

      Firstly it is quite tricky to automatically download the SDK from the website and I ended up writing a bash script that is parsing the AMD website to download the SDK (Script to download the AMD APP SDK · GitHub ) and it would be great if that could be simplified. Automatic testing brings great benefits for development, but those get partially lost if a website change from AMD breaks the script.

       

      Secondly since switching to the APP SDK we are seeing stalls and seqfaults in our tests (that are working fine on OSX travis, and locally on nvidia). A successful travis run looks like this: Travis CI - Test and Deploy Your Code with Confidence and this is one with a seqfault Travis CI - Test and Deploy Your Code with Confidence  and one that is stalling Travis CI - Test and Deploy Your Code with Confidence

       

      Signal (11): Segmentation fault
      while loading /home/travis/.julia/v0.5/OpenCL/test/test_kernel.jl, in expression starting on line 1
      unknown function (ip: 0x7f7ce2a06659)
      clWaitForEvents at /home/travis/AMDAPPSDK/lib/x86_64/libamdocl64.so (unknown line)
      [inline] at /home/travis/.julia/v0.5/OpenCL/src/api.jl:15
      wait at /home/travis/.julia/v0.5/OpenCL/src/event.jl:145
      jl_apply_generic at /home/travis/julia/bin/../lib/julia/libjulia.so (unknown line)
      anonymous at /home/travis/.julia/v0.5/OpenCL/src/event.jl:29
      unknown function (ip: 0x7f7f04d163c3)
      unknown function (ip: 0x7f7f04d1777a)
      

       

      and locally

      signal (11): Segmentation fault
      clWaitForEvents at /home/wallnuss/AMDAPPSDK/lib/x86_64/libOpenCL.so (unknown line)
      wait at /home/wallnuss/.julia/v0.4/OpenCL/src/macros.jl:4
      jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)
      anonymous at /home/wallnuss/.julia/v0.4/OpenCL/test/test_event.jl:56
      context at /home/wallnuss/.julia/v0.4/FactCheck/src/FactCheck.jl:474
      jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)
      anonymous at /home/wallnuss/.julia/v0.4/OpenCL/test/test_event.jl:24
      facts at /home/wallnuss/.julia/v0.4/FactCheck/src/FactCheck.jl:448
      jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)
      

       

      On our side the seqfault always comes from OpenCL.jl/event.jl at e95be336e4731058c54a48eba94577a05483ff41 · JuliaGPU/OpenCL.jl · GitHub   ,but is triggered by either GC or direct calls to wait(CLEvent). The PR implementing the switch can be found at Make use of the new travis docker infrastructure by vchuravy · Pull Request #104 · JuliaGPU/OpenCL.jl · GitHub

       

      Since this code has previously been working I don't quite now how to best debug this issue and any help would be appreciated.

       

      As a finally note travis is running on Intel Xeon E5-2860 and I am locally debugging against an Intel Core i7-3667U and both show the same issue.