AnsweredAssumed Answered

Setting up AMD APP SDK 3.0 for automated testing on travis-ci.org (and seqfault/stalling in clWaitForEvents)

Question asked by vchuravy on Jan 15, 2016
Latest reply on Jan 17, 2016 by vchuravy

Dear All,

 

I am one of the developers of the Julia bindings for OpenCL (JuliaGPU/OpenCL.jl · GitHub ) and we are using travis-ci for automated testing, I am currently in the progress of switching from the old travis infrastructure to their new docker based infrastructure (in which I can't use sudo for installing software anymore). Previously we installed the fglrx packages for Ubuntu (which also provides a CPU implementation for OpenCL) and everything worked well. Now since we can't install fglrx anymore and we would like to use an up to date OpenCL implementation we switched over to AMD APP SDK.

 

Firstly it is quite tricky to automatically download the SDK from the website and I ended up writing a bash script that is parsing the AMD website to download the SDK (Script to download the AMD APP SDK · GitHub ) and it would be great if that could be simplified. Automatic testing brings great benefits for development, but those get partially lost if a website change from AMD breaks the script.

 

Secondly since switching to the APP SDK we are seeing stalls and seqfaults in our tests (that are working fine on OSX travis, and locally on nvidia). A successful travis run looks like this: Travis CI - Test and Deploy Your Code with Confidence and this is one with a seqfault Travis CI - Test and Deploy Your Code with Confidence  and one that is stalling Travis CI - Test and Deploy Your Code with Confidence

 

Signal (11): Segmentation fault
while loading /home/travis/.julia/v0.5/OpenCL/test/test_kernel.jl, in expression starting on line 1
unknown function (ip: 0x7f7ce2a06659)
clWaitForEvents at /home/travis/AMDAPPSDK/lib/x86_64/libamdocl64.so (unknown line)
[inline] at /home/travis/.julia/v0.5/OpenCL/src/api.jl:15
wait at /home/travis/.julia/v0.5/OpenCL/src/event.jl:145
jl_apply_generic at /home/travis/julia/bin/../lib/julia/libjulia.so (unknown line)
anonymous at /home/travis/.julia/v0.5/OpenCL/src/event.jl:29
unknown function (ip: 0x7f7f04d163c3)
unknown function (ip: 0x7f7f04d1777a)

 

and locally

signal (11): Segmentation fault
clWaitForEvents at /home/wallnuss/AMDAPPSDK/lib/x86_64/libOpenCL.so (unknown line)
wait at /home/wallnuss/.julia/v0.4/OpenCL/src/macros.jl:4
jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)
anonymous at /home/wallnuss/.julia/v0.4/OpenCL/test/test_event.jl:56
context at /home/wallnuss/.julia/v0.4/FactCheck/src/FactCheck.jl:474
jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)
anonymous at /home/wallnuss/.julia/v0.4/OpenCL/test/test_event.jl:24
facts at /home/wallnuss/.julia/v0.4/FactCheck/src/FactCheck.jl:448
jl_apply_generic at /usr/bin/../lib/julia/libjulia.so (unknown line)

 

On our side the seqfault always comes from OpenCL.jl/event.jl at e95be336e4731058c54a48eba94577a05483ff41 · JuliaGPU/OpenCL.jl · GitHub   ,but is triggered by either GC or direct calls to wait(CLEvent). The PR implementing the switch can be found at Make use of the new travis docker infrastructure by vchuravy · Pull Request #104 · JuliaGPU/OpenCL.jl · GitHub

 

Since this code has previously been working I don't quite now how to best debug this issue and any help would be appreciated.

 

As a finally note travis is running on Intel Xeon E5-2860 and I am locally debugging against an Intel Core i7-3667U and both show the same issue.

Outcomes