cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Bdot
Adept III

Abort in atig6txx!XopOpenLinkedAdapter+0x139e

Memory corruption in clGetPlatformIDs ?

I have some OpenCL1.1 application (Win7/64) that works fine when running on the CPU (Core i7) of one machine where the only GPU is nvidia.

Running the same app on another machine (Phenom X4 955 + HD 5750) will

1. only list one platform with only the GPU - no CPU device is visible

2. often crash on the first clWaitForEvents - no matter if the event is for a clEnqueue*Buffer, -Task or -NDRangeKernel, or multiple of them.

 

for 1.: How can I enable the CPU as an OpenCL device?

for 2.: As the stack indicates an invalid pointer read:

amdocl64!clIcdGetPlatformIDsKHR+0x34eee8
amdocl64!clIcdGetPlatformIDsKHR+0x4c2fe
amdocl64!clIcdGetPlatformIDsKHR+0x2c0de
amdocl64!clIcdGetPlatformIDsKHR+0x1daa4
amdocl64!clIcdGetPlatformIDsKHR+0x39894
amdocl64!clIcdGetPlatformIDsKHR+0x3a1c2
amdocl64!clIcdGetPlatformIDsKHR+0x28ebd
amdocl64!clIcdGetPlatformIDsKHR+0x29419
amdocl64!clIcdGetPlatformIDsKHR+0x4a818
amdocl64!clIcdGetPlatformIDsKHR+0x4aa46
amdocl64!clIcdGetPlatformIDsKHR+0x4db41
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x1d

Attempt to read from address 00000000002cfff8

FAULTING_THREAD:  0000000000000850

DEFAULT_BUCKET_ID:  INVALID_POINTER_READ

I assume a bug either already corrupted the memory, or I pass bad parameters. To verify that, I enabled page heap (gflags /p /full /enable my.exe). But this results in an abort in the very first CL-call, clGetPlatformIDs(0, NULL, &numplatforms):

atig6txx!XopOpenLinkedAdapter+0x139e
atig6txx!SetThunkProxyBypassMode+0x1c97
aticaldd64!calddiGetExport+0x17a420
aticaldd64!calddiGetExport+0x17c0b8
aticaldd64!calddiGetExport+0x47d98
aticaldd64!calddiGetExport+0x1f4be
aticaldd64!calddiGetExport+0x16bb9
aticaldd64!calddiGetExport+0x1d9
amdocl64!clIcdGetPlatformIDsKHR+0x17d80
amdocl64!clIcdGetPlatformIDsKHR+0x1c2f9
amdocl64!clIcdGetPlatformIDsKHR+0x484b
amdocl64!clIcdGetPlatformIDsKHR+0x45a9b
amdocl64+0x2017
amdocl64!clGetPlatformInfo+0x4b
amdocl64!clGetDeviceInfo+0xbe5
OpenCL+0x1137
OpenCL!clGetExtensionFunctionAddress+0x42e
OpenCL+0x101c
OpenCL!clGetPlatformIDs+0x1c
mfacto!init_CL+0x44

And strange again, on the Core i7-box with the nvidia-GPU, enabling page heap reveals no error. For a last test I enabled page heap for one of the sample programs, and it crashes the same way as my app. Therefore the conclusion is that on my AMD box, without page heap, clGetPlatformIDs already corrupts the heap, just the crash happens later.

3. Is it possible to get the debug symbols for the OpenCL dlls so that the stack traces are more meaningful? Maybe there is some symbol server available?

4. Is this bug in clGetPlatformIDs already known (if it is there)? Is there some fix or workaround available (I updated the drivers, Windows tells its version 8.850.0.0, OpenCL says driver version: CAL 1.4.1385 (VM))? If not, where could I file a bug report?

 

0 Likes
8 Replies
himanshu_gautam
Grandmaster

If CPU is not shown it should be a SDK installation issue. The programs failing on CLGetPlatformId also indicate the same.

I hope you are using SDK 2.4.

0 Likes

Yes, it's 2.4. I now deinstalled all ATI/AMD software, rebooted, reinstalled.

No change. Still no CPU, and still the abort when enabling page heap.

However, I found the bug in my code that made me attempt page heap in the first place. (clCreateBuffer bigger than what I allocated and ..USE_HOST_POINTER ...) With that fixed my program now runs on the GPU (as long as I don't use page heap ).

This makes the other two issues less important, yet they persist. Anyone an idea how to get the CPU into the OpenCL devices? Or what to check?

And the pageheap-abort is certainly a memory-handling bug.  atig6txx!XopOpenLinkedAdapter should not read memory that does not belong to the application.

 

0 Likes

Bdot,

Okay, I will report it to concerned persons.

Can you give me some more details about your system. What instruction sets are supported by your AMD CPU, its model number, and operating system you are using.

Thanks

0 Likes

Hi Himanshu,

thanks for your help. My machine is an AMD Phenom(tm) II X4 955 Processor 3.2GHz, running Windows7 64-bit. CPU features: Prefetch, 3DNow!, MMX, SSE, SSE2, caches: 64kB-512kB-6MB

GPU is a ATI HD 5750 (Juniper) -  device (and driver) version   OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) (CAL 1.4.1353 (VM))

Some time I'll try this on Linux as I have an OpenSuSE 11.4 partition there as well ... lets see what Linux on the same box will say.

Bdot

 

0 Likes

Bdot,

Thanks for the information.

It would be nice if you can post your clinfo output also. (only showing GPU)

0 Likes

Interesting.

clinfo shows both the GPU and the CPU device. I need to check my code why I see only the GPU ...

 

And clinfo does not like running in parallel to my OpenCL program - the machine reliably locks up when clinfo exits, only hard reset helps then.

Anyway, here´s the GPU part of the clinfo output:

 

Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 10 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 850Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 00000000015CD118 Name: Juniper Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1353 (VM) Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing

0 Likes

Bdot,

Now that is confusing. Also I also work with juniper on an AMD Athlon machine, and it works good here. We also have a phenom machine but i don't remember its model.

Although I think that your CPU is a bit old , but as SSE2 is supported it should work. So here is the current conclusion:

clInfo shows both CPU & GPU but hangs the system after that.

your program crashes on CPU, and works on GPU.

I will report it to AMD developers. If you can provide some code that crashes the CPU it would be nice to post here. You can check if other samples are working on CPU.

0 Likes

Sorry for the confusion. Now both CPU and GPU devices are available, that was my mistake. What remains is the abort mentioned in the thread title:

  • only on GPU, not on CPU
  • only if pageheap is enabled
  • reproducible with the examples

To reproduce:

  1. download and install Debugging Tools for Windows from MS
  2. F:\software\AMD APP\samples\opencl\bin\x86_64>"c:\Program Files\Debugging Tools for Windows (x64)\gflags.exe" /p /enable TemplateC.exe /full
    path: SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options
        templatec.exe: page heap enabled
  3. Run TemplateC.exe - it will crash after listing the input values. Call stack:
    atig6txx!XopOpenLinkedAdapter+0x139e
    atig6txx!SetThunkProxyBypassMode+0x1c97
    aticaldd64!calddiGetExport+0x17a420
    aticaldd64!calddiGetExport+0x17c0b8
    aticaldd64!calddiGetExport+0x47d98
    aticaldd64!calddiGetExport+0x1f4be
    aticaldd64!calddiGetExport+0x16bb9
    aticaldd64!calddiGetExport+0x1d9
    amdocl64!clIcdGetPlatformIDsKHR+0x17d80
    amdocl64!clIcdGetPlatformIDsKHR+0x1c2f9
    amdocl64!clIcdGetPlatformIDsKHR+0x484b
    amdocl64!clIcdGetPlatformIDsKHR+0x45a9b
    amdocl64+0x2017
    amdocl64!clGetPlatformInfo+0x4b
    amdocl64!clGetDeviceInfo+0xbe5
    OpenCL+0x1137
    OpenCL!clGetExtensionFunctionAddress+0x42e
    OpenCL+0x101c
    OpenCL!clGetPlatformIDs+0x1c

 

0 Likes