I am developing an OpenCL application under windows and am experiencing very sporadic access violations in clEnqueueReadBuffer.
My driver version is 16.3.2, but the access violations also happen with older versions.
The access violation happens in:
amdocl64!clSetKernelExecInfo+0x5c3d:
00007ffa`ce48a29d f0480fb14a08 lock cmpxchg qword ptr [rdx+8],rcx ds:00000166`b0928fe8=????????????????
Using page heap I found that the problem is an INVALID_POINTER_WRITE_AVRF
Attempt to write to address 00000166b0928fe8
And here is a stack trace:
0000000f`b0bfd5f0 00007ffa`ce497f57 : 00000167`58606e70 00000167`58606e70 00000161`37d60ec0 ffffffff`fffffffe : amdocl64!clSetKernelExecInfo+0x5c3d
0000000f`b0bfd620 00007ffa`ce47b77d : 00000167`58606e70 00000167`58606e70 00000161`7e747e90 00000000`00000000 : amdocl64!clSetKernelExecInfo+0x138f7
0000000f`b0bfd660 00007ffa`9afa4822 : 00000167`bb211000 00000161`1eba7e80 00000000`00000000 00000000`00000000 : amdocl64!clEnqueueReadBuffer+0x2ad
0000000f`b0bfd760 00007ffa`9afbb3e7 : 00000161`37d60ed0 00000161`7e747ea0 0000000f`b0bfd970 00007ffa`9afa3fe1 : 0x00007ffa`9afa4822
0000000f`b0bfd900 00007ffa`9afbac5a : 00000161`1e803ee0 00000161`26e97960 00000166`00000000 00000000`00000000 : 0x00007ffa`9afbb3e7
0000000f`b0bfd9e0 00007ffa`9afba3a2 : 00000161`1e803ee0 00000161`26e97918 00000167`bb211000 00000166`d59fb260 : 0x00007ffa`9afbac5a
0000000f`b0bfdcf0 00007ffa`9afb9831 : 00000161`1e59a168 00000161`1e803ee0 00000161`26e97918 00000166`cc7ddfe8 : 0x00007ffa`9afba3a2
0000000f`b0bfde00 00007ffa`9afc4f4f : 00000161`1e59a168 00000161`1e803ee0 00000161`26e97918 00000166`d59fb1a0 : 0x00007ffa`9afb9831
0000000f`b0bfdf00 00007ffa`9af98c91 : 00000161`1e803e68 00000166`cc7dded0 00000000`00000000 00000166`d59f7d10 : 0x00007ffa`9afc4f4f
0000000f`b0bfe0a0 00007ffa`9af98aba : 00000161`1e8022c0 00000166`cc7dded0 00000161`0000000c 00000161`1eba48a0 : 0x00007ffa`9af98c91
0000000f`b0bfe470 00007ffa`9a6ae9f1 : 00000161`1e82ec10 00000166`cc7dded0 00000161`0000000c 00000000`00000000 : 0x00007ffa`9af98aba
0000000f`b0bfe4a0 00007ffa`9a6ae649 : 00000161`1e82ec10 00000161`1e5f1008 00000161`1e5f10e0 00000161`1eba7df8 : 0x00007ffa`9a6ae9f1
0000000f`b0bfe670 00007ffa`f82d2cb7 : 00000161`1e832148 00007ffa`f82d30a4 00007ffa`f84a0c68 00007ffa`f93bda7a : 0x00007ffa`9a6ae649
0000000f`b0bfe6a0 00007ffa`f82fa79e : 00000161`1e8321a8 00007ffa`f93bd9d9 00007ffa`f84a0c68 00000000`00000000 : mscorlib_ni+0x502cb7
0000000f`b0bfe6e0 00007ffa`f82fa637 : 00000000`00000515 00000000`00000000 0000000f`b0bfeb00 00000161`1eba7ee0 : mscorlib_ni+0x52a79e
0000000f`b0bfe7b0 00007ffa`f82d2f5d : 00000161`1e564010 00000000`01000002 00000161`14471000 00007ffa`f1eb2fb8 : mscorlib_ni+0x52a637
0000000f`b0bfe7e0 00007ffa`f82d2628 : 00000161`1e8321a8 00000161`1e835370 00000161`1eba7e38 00000000`01000002 : mscorlib_ni+0x502f5d
0000000f`b0bfe890 00007ffa`f82fa79e : 00000161`1e564010 00000000`00000000 00000000`01000002 00000000`00000000 : mscorlib_ni+0x502628
0000000f`b0bfe8d0 00007ffa`f82fa637 : 00000000`00000000 00007ffb`06cd8778 0000000f`b0bfea30 00000161`81619be8 : mscorlib_ni+0x52a79e
0000000f`b0bfe9a0 00007ffa`f82fa5f2 : 00000161`1eba7ee0 00007ffb`00000003 00000161`81619958 00000161`14471000 : mscorlib_ni+0x52a637
0000000f`b0bfe9d0 00007ffa`f8c18a4d : 0000000f`b0bfeb78 00007ffa`f93d5912 00000000`00000012 00000000`00000000 : mscorlib_ni+0x52a5f2
0000000f`b0bfea20 00007ffa`f93a3bd3 : 00000161`1eba7f08 00000161`1eba7e38 00000000`00000000 00000000`00000001 : mscorlib_ni+0xe48a4d
0000000f`b0bfea60 00007ffa`f93a3a95 : 0000000f`b0bfed38 00007ffa`f94de619 00000000`00000000 00000000`00000000 : clr!CallDescrWorkerInternal+0x83
0000000f`b0bfeaa0 00007ffa`f93a44c6 : 00000000`00000002 0000000f`b0bfecb0 0000000f`b0bfec78 0000000f`b0bfec58 : clr!CallDescrWorkerWithHandler+0x4e
0000000f`b0bfeae0 00007ffa`f9584b15 : 0000000f`b0bff0f0 0000000f`b0bff030 00007ffa`f8481548 0000000f`b0bff030 : clr!MethodDescCallSite::CallTargetWorker+0xf8
0000000f`b0bfebe0 00007ffa`f93a4a59 : 00000000`0000a000 0000000f`b0bff030 00000161`59378840 00000000`00000000 : clr!ThreadNative::KickOffThread_Worker+0xf63d1
0000000f`b0bfee40 00007ffa`f93a495c : 0000000f`b0bff030 00007ffa`f944ec24 0000000f`b0bfef40 00007ffa`f93a364a : clr!ManagedThreadBase_DispatchInner+0x2d
0000000f`b0bfee80 00007ffa`f93a489a : 00000000`00000001 00000000`00000000 00000000`00000001 00000000`00000000 : clr!ManagedThreadBase_DispatchMiddle+0x6c
0000000f`b0bfef80 00007ffa`f93a49b7 : ffffffff`ffffffff 00000161`59378840 00000000`00000000 00000161`5938aff0 : clr!ManagedThreadBase_DispatchOuter+0x75
0000000f`b0bff010 00007ffa`f948d566 : 00000161`59378840 0000000f`00000001 00000161`5938aff0 00000000`0000001d : clr!ManagedThreadBase_FullTransitionWithAD+0x2f
0000000f`b0bff070 00007ffa`f9425472 : 00000161`59386fe0 00000161`59378840 0000000f`b0bff0c8 00007ffb`0241e1ad : clr!ThreadNative::KickOffThread+0xd6
0000000f`b0bff140 00007ffb`042e8102 : 00007ffa`f94253fc 00000161`5938aff0 00000000`00000000 00000000`00000000 : clr!Thread::intermediateThreadProc+0x7d
0000000f`b0bffe00 00007ffb`06ccc5b4 : 00007ffb`042e80e0 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x22
0000000f`b0bffe30 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x34
1) Does anyone experience similar problems?
2) Can anyone give a hint, what is happening in amdocl64!clSetKernelExecInfo+0x5c3d? Perhaps this can lead me in the right direction where to continue debugging...
Check your code, you probably have a bug.
If you can't find it, please post a minimal reproducer.
I'm checking my code for days now and hoped that anyone can give a hint what is happening in that line, so that I know what I'm searching for...
Please post some of your code so we can see what is happening. Stack trace is not enough.
Unfortunately the code is part of a quite complex image acquisition and processing software and I am not able to extract a minimal reproducer.
Can you give any hint what is done in the last two lines of the stack strace? What is happening inside those clSetKernelExecInfo calls? Is it about events or about buffers? That would help extremely to narrow down the source of error in my code.
I also found that the crash happens from different position, but the lines where it's failing are always in clSetKernelExecInfo. Here is another example:
00000033`fcbfcfd0 00007ff8`4f987f57 : 000001e3`0d034ef0 000001e3`0d034ef0 000001e0`30be9ec0 00000000`00000000 : amdocl64!clSetKernelExecInfo+0x5be9
00000033`fcbfd000 00007ff8`4f97a349 : 000001e3`0d034ef0 000001e3`0d034ef0 000001e0`30be9ed0 000001e0`30be9ec0 : amdocl64!clSetKernelExecInfo+0x138f7
00000033`fcbfd040 00007ff8`4f96276c : 000001e0`30be9ed0 000001e3`0d034ef0 00000000`00000000 00000000`00000000 : amdocl64!clSetKernelExecInfo+0x5ce9
00000033`fcbfd080 00007ff8`0ddeb9ce : 000001e0`12f07948 00000000`00000000 000001e0`1348f688 00000033`fcbfd1a0 : amdocl64!clFinish+0x8c
Are these lines about event invocation? Any hints?
Bro', we can't help you much without a minimal reproducer, in code.