cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

ruwen
Journeyman III

OpenCL driver causing heap-corruption?

Hi,

we have a severe issue with OpenCL and I wonder if anyone else has a similar problem, if this is a known AMD driver issue or if we are doing something wrong. We are using AMD Radeon RX 570 and 580 GPUs with OpenCL on Windows and we see random crashes due to heap corruption in our application - but only when running on AMD GPUs, we have never observed the problem when using NVidia.

We have created a minimal executable that can be used to reproduce the issue. It will crash with heap corruption after running for several hours (occasionally it may take > 24h for it to crash, sometimes after only 30Min and on average after about 4 hours). It contains an infinite loop which repeatedly reads a buffer from the GPU via clEnqueueReadBuffer. In addition, it has several threads which continuously allocate and de-allocate. This way the heap layout changes all the time and the crash appears quicker. We were not able to reproduce the crash when omitting the call to clEnqueueReadBuffer. We have executed the same executable on a system with NVidia card and had no crash after running it for a whole week.

The issue exists in several older driver versions (also from 2018). With the latest driver release (20.1.1) the issue still exists, but the average time until our minimal executable crashes has increased somewhat (now 6h on average).

I am attaching the Visual Studio project for the example program (Visual C++ 2019), but below I also post thethe main method for reference.

Thanks for any help or hints!

Here the code of the main method:

int main()
{
    auto mc_iWidth = 1280;
    auto mc_iHeight = 960;
    const int c_iSize = mc_iWidth * mc_iHeight;
    const int c_iBytesPerPixelRGBA = 4;

    const int numBytes = c_iSize * c_iBytesPerPixelRGBA;

    // get available platforms
    cl_uint uiPlatformCount = 0;
    auto iErrcode = clGetPlatformIDs(0, nullptr, &uiPlatformCount);
    if (iErrcode != CL_SUCCESS)
    {
        return 0;
    }

    if (!uiPlatformCount)
    {
        return 0;
    }

    std::vector<cl_platform_id> platformIds(uiPlatformCount);
    iErrcode = clGetPlatformIDs(uiPlatformCount, platformIds.data(), nullptr);

    auto platformId = platformIds[0];

    cl_uint uiDeviceCountGPU = 0;
    iErrcode = clGetDeviceIDs(platformId, CL_DEVICE_TYPE_GPU, 0, nullptr, &uiDeviceCountGPU);

    if (uiDeviceCountGPU == 0)
    {
        return 0;
    }

    std::vector<cl_device_id> gpuDeviceIDs(uiDeviceCountGPU, NULL);
    iErrcode = clGetDeviceIDs(platformId, CL_DEVICE_TYPE_GPU, uiDeviceCountGPU, gpuDeviceIDs.data(), &uiDeviceCountGPU);

    if (iErrcode != CL_SUCCESS)
    {
        return 0;
    }

    auto deviceId = gpuDeviceIDs[0];

    iErrcode = CL_SUCCESS;
    cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platformId, 0 };
    auto context = clCreateContext(props, (uint32_t)gpuDeviceIDs.size(), gpuDeviceIDs.data(), ContextErrorCallback, nullptr, &iErrcode);

    if (iErrcode != CL_SUCCESS)
    {
        return 0;
    }

    auto bufferHandle = clCreateBuffer(context, CL_MEM_READ_WRITE, numBytes, nullptr, &iErrcode);

    if (iErrcode != CL_SUCCESS)
    {
        std::cout << "Error" << endl;
        return 0;
    }

    auto queue = clCreateCommandQueueWithProperties(context, deviceId, NULL, &iErrcode);

    if (iErrcode != CL_SUCCESS)
    {
        std::cout << "Error" << endl;
        return 0;
    }

    for (int i = 0; i < (1 << 30); ++i)
    {
        std::vector< uint8_t > vec(numBytes);
        iErrcode = clEnqueueReadBuffer(
            queue,            // OpenCL Handle (cl_command_queue)
            bufferHandle,    // OpenCL Handle (cl_mem)
            true,            // blocking read
            0,                // offset (erst ab dieser Stelle lesen)
            numBytes,        // number of bytes to read
            vec.data(),        // target address
            0,                // num events
            NULL,            // event queue
            NULL);            // event

        if (iErrcode != CL_SUCCESS)
        {
            std::cout << "Error read buffer" << endl;
            return 0;
        }
    }

    return 1;
}

0 Likes
4 Replies
dipak
Big Boss

Thank you for reporting the above issue and providing a reproducible test-case. We will look into this issue and get back to you shortly.

Thanks.

0 Likes
dipak
Big Boss

As I can see from the attached code, the application creates few threads and allocates large memory inside each thread. It seems the thread-logic is not directly related to the runtime issue (i.e. clEnqueueReadBuffer ) that you reported. So, may I assume that the issue is still reproducible if I disable that logic (using #define DO_STRESS_TEST 0)?

Thanks.

0 Likes
ruwen
Journeyman III

Hi dipak,

thanks a lot for looking into this! We are quite stuck with this problem and grateful for any support.

The reason for the additional threads is that we currently only notice the corrupted heap when at some point the application crashes during a call to a heap function (allocating or de-allocating memory). Without the additional threads, this may take an extremely long time to occur since the memory used by the heap does not change (We found that it will rotate between 16 different addresses). If, for whatever reason, these addresses are not affected, the crash will not occur. With the additional threads, the buffer address for clEnqueueReadBuffer is practically randomized and the error is much more likely to occur. We did validate however that no heap corruption occurs if we run only the allocation/de-allocating threads.

Ideally we could validate the heap/memory between calls to clEnqueueReadBuffer, but we are not sure how to do this.

As a side-note: In the code you will also find the #define DO_MAP. With this flag we replaced clEnqueueReadBuffer with calls to clEnqueueMapBuffer. In that case, and with this little test application, we could no longer reproduce the heap-corruption. However, when making the same change in our actual application, we still see heap-corruptions (and don't get them on NVidia systems). Therefore, we believe that the problem might not be exclusive to clEnqueueReadBuffer.

Thanks and best regards

Ruwen

0 Likes

Thank you for the clarification. I've reported it to the concerned team. Once I get any feedback from them, I'll come back to you.

Thanks.

0 Likes