Hi, anyone who cares
I meet with memory leak when calling clEnqueueNDRangeKernel in a deadloop using amd card, below case I used an empty kernel, but I think this issue is not because of my empty kernel, I ever use matrix add/mul kernel and kernel execution results are right but still meet memory leak.
my code looks as below:
-----------------------------------------code start------------------------------------------------------------------
do {
for (i = 0; i < ciDeviceCount; ++i) {
ciErrNum = clEnqueueNDRangeKernel(commandQueue, matrixEqual, 2, 0, globalWorkSize, localWorkSize,
0, NULL, &GPUExecution);
oclCheckError(ciErrNum, CL_SUCCESS);
ciErrNum = clFinish(commandQueue);
oclCheckError(ciErrNum, CL_SUCCESS);
}
} while(1);
-----------------------------------------code end----------------------------------------------------------------------
and my kernel is an empty kernel
-----------------------------------------code start------------------------------------------------------------------
__kernel void
matrixEqual(int m, int n)
{
m = n;
}
-----------------------------------------code end----------------------------------------------------------------------
after run many loops, my process's memory consuming rising to 7g and finally killed by the linux kernel.
In /var/log/syslog, it showed that kernel kill the process because out of memory.
snd kernel: [ 8779.289654] Out of memory: Kill process 5239 (myprocess) score 917 or sacrifice child
OS version:
Linux snd 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Part of clinfo:
Platform Version: OpenCL 2.0 AMD-APP (2482.3)
Platform ID: 0x7f19c67e1098
Name: Ellesmere
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2482.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2482.3)
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Radeon RX 570 Series
Hardware
2 cores Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
8 cards Radeon RX 570 Series
I also attach the full code to the post.
Hi,
Thank you for reporting this.
The attached code seems incomplete. Please share a complete repro. Also please mention the driver version (say AMDGPU-Pro X.Y). If the driver is not the latest one, please try the latest driver and share your observation.
From the above code snippet, it looks like you haven't released the event object (GPUExecution) generated against each clEnqueueNDRangeKernel call. It can also cause a memory leak.
Regards,
Hi, dipak
Thanks for you kindly reply.
I tried your suggestion, which remove GPUExecution in clEnqueueNDRangeKernel but memory leak still happen.
I wanner try your 2nd suggestion, check and update AMDGPU-Pro version. Could you please help to tell
1. how to check my amd-gpu driver version? It is not installed by me, and dpkg -l amdgpu-pro showed there was no amdgpu-pro installed, but I can see its folder at /opt/amdgpu-pro
2. I only found newest amdgpu-pro 18.20 on website Radeon™ Software for Linux® 18.20 Release Notes . But it said only supporting opencl 1.2, while my Radeon RX 570 Series card said it supporting opencl 2.0? So where can I find driver supporting opencl 2.0? Do I need install Rocm for opencl 2.0 development.
Below is amdgpu-pro 18.2 support list
Below is RX 570 Series
Thanks a lot again.
Currently, AMDGPU-Pro supports OpenCL 1.2 only. That's why RX 570 is listed as OpenCL 1.2 device though it supports OpenCL 2.0.
For OpenCL 2.0 kernel programming, you can choose ROCm. It provides OpenCL 2.0 compatible kernel language support with OpenCL 1.2 compatible runtime.
Also, from the above clinfo version information, it looks like that the installed driver is an older one. Please install the latest amdgpu-pro 18.20 and check. If the issue is still reproducible, please share a complete repro.
Regards,