Bdot

Abort in atig6txx!XopOpenLinkedAdapter+0x139e

Discussion created by Bdot on May 27, 2011
Latest reply on Jun 10, 2011 by Bdot
Memory corruption in clGetPlatformIDs ?

I have some OpenCL1.1 application (Win7/64) that works fine when running on the CPU (Core i7) of one machine where the only GPU is nvidia.

Running the same app on another machine (Phenom X4 955 + HD 5750) will

1. only list one platform with only the GPU - no CPU device is visible

2. often crash on the first clWaitForEvents - no matter if the event is for a clEnqueue*Buffer, -Task or -NDRangeKernel, or multiple of them.

 

for 1.: How can I enable the CPU as an OpenCL device?

for 2.: As the stack indicates an invalid pointer read:

amdocl64!clIcdGetPlatformIDsKHR+0x34eee8
amdocl64!clIcdGetPlatformIDsKHR+0x4c2fe
amdocl64!clIcdGetPlatformIDsKHR+0x2c0de
amdocl64!clIcdGetPlatformIDsKHR+0x1daa4
amdocl64!clIcdGetPlatformIDsKHR+0x39894
amdocl64!clIcdGetPlatformIDsKHR+0x3a1c2
amdocl64!clIcdGetPlatformIDsKHR+0x28ebd
amdocl64!clIcdGetPlatformIDsKHR+0x29419
amdocl64!clIcdGetPlatformIDsKHR+0x4a818
amdocl64!clIcdGetPlatformIDsKHR+0x4aa46
amdocl64!clIcdGetPlatformIDsKHR+0x4db41
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x1d

Attempt to read from address 00000000002cfff8

FAULTING_THREAD:  0000000000000850

DEFAULT_BUCKET_ID:  INVALID_POINTER_READ

I assume a bug either already corrupted the memory, or I pass bad parameters. To verify that, I enabled page heap (gflags /p /full /enable my.exe). But this results in an abort in the very first CL-call, clGetPlatformIDs(0, NULL, &numplatforms):

atig6txx!XopOpenLinkedAdapter+0x139e
atig6txx!SetThunkProxyBypassMode+0x1c97
aticaldd64!calddiGetExport+0x17a420
aticaldd64!calddiGetExport+0x17c0b8
aticaldd64!calddiGetExport+0x47d98
aticaldd64!calddiGetExport+0x1f4be
aticaldd64!calddiGetExport+0x16bb9
aticaldd64!calddiGetExport+0x1d9
amdocl64!clIcdGetPlatformIDsKHR+0x17d80
amdocl64!clIcdGetPlatformIDsKHR+0x1c2f9
amdocl64!clIcdGetPlatformIDsKHR+0x484b
amdocl64!clIcdGetPlatformIDsKHR+0x45a9b
amdocl64+0x2017
amdocl64!clGetPlatformInfo+0x4b
amdocl64!clGetDeviceInfo+0xbe5
OpenCL+0x1137
OpenCL!clGetExtensionFunctionAddress+0x42e
OpenCL+0x101c
OpenCL!clGetPlatformIDs+0x1c
mfacto!init_CL+0x44

And strange again, on the Core i7-box with the nvidia-GPU, enabling page heap reveals no error. For a last test I enabled page heap for one of the sample programs, and it crashes the same way as my app. Therefore the conclusion is that on my AMD box, without page heap, clGetPlatformIDs already corrupts the heap, just the crash happens later.

3. Is it possible to get the debug symbols for the OpenCL dlls so that the stack traces are more meaningful? Maybe there is some symbol server available?

4. Is this bug in clGetPlatformIDs already known (if it is there)? Is there some fix or workaround available (I updated the drivers, Windows tells its version 8.850.0.0, OpenCL says driver version: CAL 1.4.1385 (VM))? If not, where could I file a bug report?

 

Outcomes