AnsweredAssumed Answered

How to disable one GPU on a 2 GPU card in Opencl

Question asked by drallan on Apr 15, 2018
Latest reply on Apr 17, 2018 by drallan

Is there a way to disable (hide) one GPU on a dual GPU card in OpenCL?

 

I'm using AMD 295X2 (dual 290X) GPU cards for OpenCL under Ubuntu Linux 14.04 LTS. One GPU on one card is starting to fail and often hangs Opencl. (if I bang the card just right it works for a short while, I suspect a BGA problem) I want to fully disable that GPU so I don't have to remove the card and loose two GPUs.

 

What I know so far.

1. I can't ignore (not use) that GPU  because it hangs clinfo  or clGetPlatformIDs(...)  when starting OpenCL, it must be disabled or somehow skipped.

2. The two GPUs on a 295X2 are completely separate devices appearing as 2 cards and 2 PCI devices with 2 headers.

3. I can disable the bad GPU in linux at boot by making a file in /etc/udev/rules.d telling linux to remove that PCI device. lspci then shows nothing, gone.

4. Alas, even if I do disable it in Linux, clGetPlatformIDs() still sees and it hangs, so I assume OpenCL is rescanning the PCI bus?

 

Searching for help I found some ways to disable AMD GPUs in OpenCL but they disabled all GPUs of the same type or all on a single card. Nothing could reference a single device.

 

Why there is hope:

1. After calling clGetPlatformIDs() one can easily reference or skip individual PCI devices.

2. OverDrive (ODX) driver software (source code) does scan by PCI device address, could skip a device, and does not hang even when getting the GPUs temps.

 

Any feedback is much appreciated.

Outcomes