AnsweredAssumed Answered

clinfo on GX-424CC, works first time then segfaults

Question asked by opello on Nov 16, 2015
Latest reply on Nov 25, 2015 by nou

Hello.  I'm trying to add OpenCL GPU support to an AMD GX-424CC based embedded system similar to the DB-FT3b-LC.

My target environment is x86_64 Linux 3.18.20 built with gcc 4.9.2 and glibc 2.19.

My initial testing environment is x86_64 Ubuntu 12.04.5 (precise) running Linux 3.13.0-46-generic with fglrx installed from *14.501-0ubuntu1_amd64.deb files generated using amd-driver-installer-14.501.1003-x86.x86_64.run.

 

The libraries in my target environment are currently those from the fglrx-core deb that I installed in Ubuntu.  This is because when using the files extracted from amd-driver-installer-14.501.1003-x86.x86_64.run directly (using --extract) I only ever saw clinfo detect the CPU.  I think this has to do with files I missed copying because they're in the usr/X11R6 directory after extracting (libatiadlxx.so, libatiuki.so.1).  I noticed another oddity between the generated Ubuntu debs and the files extracted from the source .run package:  the ones from the debs were stripped.  I don't think this should affect the functionality but I haven't yet ruled it out.  I should also say that I'm using 14.12 instead of 15.9 because I ran into more issues with 15.9 and my test environment was constructed when 14.12 was the latest.  I plan to investigate moving forward later on unless I need to sooner as a component of resolving these issues.

 

Each time I run clinfo from the target environment I get a soft lockup from clinfo for about 23 seconds with more than a few calls from the fglrx module in the stack trace.  I can share some example stack traces if desired.

 

When I run clinfo in my target environment it works only the very first time after a reboot, despite the following error:

<6>[fglrx] No ADL handler for Escape code 0x00110020

Subsequent runs all I get for output is "Segmentation fault" and the kernel log shows:

clinfo[105]: segfault at f73 ip 00007fe32145085b sp 00007ffff2ccdc50 error 4 in libamdocl64.so[7fe320dd0000+3840000]

The IP and SP vary but the library file name is consistently libamdocl64.so and the offset is consistently 3840000.

 

When clinfo does work the only variation I see is the Platform ID.  It's worth mentioning that I can unload and reload the fglrx module and it will work the first time after the module is loaded.  Reloading the module slightly changes the soft lockup behavior too.  There is a similar length delay (e.g. `time clinfo` reports 35s real time for the first run, and after reloading the driver reports 17s real time) and similar kernel log messages but no soft lockup after reloading the driver.

 

When running clinfo in my testing environment it behaves similarly with respect to soft lockups, stack traces, and run time reported by `time` but the ADL handler message and the segfault never happen.  The Platform ID also changes with each run of clinfo.

 

I plan to work up a minimal test case for my application because I think it fails in one of clGetPlatformIDs, clGetPlatformInfo, or clGetDeviceIDs.  And this failure is irrespective of whether it is the first OpenCL application run or not.  But my first level benchmark thus far as been whether clinfo behaves consistently.  My goal with this post is to try and determine the as minimal as is reasonable environment to run my OpenCL application.

 

Thanks for your time.

Outcomes