OpenCL

nibal · ‎12-01-2020

Hi,

I am attaching cltest.tgz which contains the build directory, cltest, with the sample problem, clinfo, and val.out which is the valgrind output showing that this corruption is deep within ocl1.2. To build the test case, just run the included script:

makecl

in any Ubuntu box.

In my test case the corruption affected the out buffer (output of the fft splitter) so I have bracketed its addresses with printfs. In my main program it is elsewhere. If it is elsewhere with your as well, just comment out the "NIKOS!!!" printfs.

Reproducible: Always

Observed symptoms: When I try to free the allocated address on exit, it drops a core:

(https://www.dropbox.com/s/7hzs968h8ra98pa/cltest.tgz?dl=0

nibal · ‎12-02-2020

Please disregard.

I continued the bracketing approach after I posted this and found the memory corruption to be in my code, not ocl's.

Don't quite understand why valgrind would post this trace stack, but I trust more my printfs:)

Sorry about the confusion:(

Nikos

nibal · ‎12-02-2020

Actually I continued the investigation with the bracketed printfs. Seems that 8 B invalid write is in the waitForEventAndRelease

bracketed by NIKOS2 & NIKOS3. The strange thing is that it is in a loop, but happens only once in the second pass.

It's always reproducible. You can download the test case, valgrind output and clinfo from:

https://www.dropbox.com/s/88ug7lvqazy440a/cltest.tgz?dl=0

nibal · ‎12-04-2020

Any access to more waitForEventAndRelease's will generate invalid reads and drop cores.

Seems we cannot use events in AMD's ocl1.2:(

dipak · ‎12-08-2020

As I can see, waitForEventAndRelease() contains these two OpenCL calls - clWaitForEvents and clReleaseEvent

Could you please try a simple test-case like the below code snippet to see if the issue is reproducible?

cl_event event;

for(int i = 0; i < N; i++) {

clEnqueue<command>(..., &event, ..); // generate an event using any command

clWaitForEvents(1, &event);

clReleaseEvent(event);

}

nibal · ‎12-08-2020

Hi,

Thanks for looking into it,

New sources uploaded to https://www.dropbox.com/s/a3bxb86a49z3a21/cltest.tgz?dl=0

You just expand the archive:

-> cd cltest

-> makecl

-> cltest

Cltest, the new executable is much simpler. Just contains 2x cl calls to clEnqueMemObject and clEnqueUnmap along with their clWaitForEvents and clReleaseEvent.

val.out is the valgrind output, and clinfo.txt my clinfo

setupCL and shutdownCL are still called to setup the 2 buffers and clear them up. No cl kernel is used.

The original invalid 8 bit write is still there and seems to be bracketed by the first clWaitForEvents. It happens only once.

This is always reproducible. I was not able to observe the invalid reads on subsequent events (omap), even when I looped it 1000x and was not able to reproduce the strange effect of happening on the 2nd pass.

Commenting out the loop, didn't generate the invalid write (setupCL and shutdownCL are clear). However, just commenting out the clWaitForEvents and clReleaseEvent, generated more invalid writes. It could be that the problem might be in the clMapMemObject, happens only once, and with clWaitForEvents delays it just enough to appear with it.

No crashes or cores this time, but should I be concerned about the quality of the output?

OpenCL

Ocl 1.2 memory corruption?