I'm seeing a soft lockup in Linux when running any OpenCL application. My test case is clinfo, which when run using strace -Ffttt demonstrates the soft lockup happening after a certain ioctl:
135 1448488023.137973 ioctl(5, 0x4004648c, 0x7ffc737e0190) = 0
and the kernel log shows:
[ 108.563955] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [clinfo:135]
(including a stack trace).
I only see this on my custom platform using a GX-424CC. I do not see it on my reference platform (Sapphire BP-FT3GS). I would appreciate any help in what to investigate. My platform is running coreboot while the Sapphire platform is running AMI BIOS, and from what I can tell the VGA BIOS is the same in both. The one difference I have spotted is the GPU device on the PCIe bus has a different subsystem device ID on the Sapphire board (1002:9851 for the PCIe device VID:DID and 1002:0123 for the subsystem VID:DID) but that seems like it is Sapphire's identifier based on this thread.
It's also worth noting that clinfo returns the expected information for this platform. And other OpenCL applications seem to work. But they all seem to incur this soft lockup on startup which is unacceptable for my use case.
I've attached the strace log as well as the relevant kernel log messages from the failure.
Thanks for your time.