8 Replies Latest reply on Jun 10, 2011 10:45 PM by Bdot

    Abort in atig6txx!XopOpenLinkedAdapter+0x139e

    Bdot
      Memory corruption in clGetPlatformIDs ?

      I have some OpenCL1.1 application (Win7/64) that works fine when running on the CPU (Core i7) of one machine where the only GPU is nvidia.

      Running the same app on another machine (Phenom X4 955 + HD 5750) will

      1. only list one platform with only the GPU - no CPU device is visible

      2. often crash on the first clWaitForEvents - no matter if the event is for a clEnqueue*Buffer, -Task or -NDRangeKernel, or multiple of them.

       

      for 1.: How can I enable the CPU as an OpenCL device?

      for 2.: As the stack indicates an invalid pointer read:

      amdocl64!clIcdGetPlatformIDsKHR+0x34eee8
      amdocl64!clIcdGetPlatformIDsKHR+0x4c2fe
      amdocl64!clIcdGetPlatformIDsKHR+0x2c0de
      amdocl64!clIcdGetPlatformIDsKHR+0x1daa4
      amdocl64!clIcdGetPlatformIDsKHR+0x39894
      amdocl64!clIcdGetPlatformIDsKHR+0x3a1c2
      amdocl64!clIcdGetPlatformIDsKHR+0x28ebd
      amdocl64!clIcdGetPlatformIDsKHR+0x29419
      amdocl64!clIcdGetPlatformIDsKHR+0x4a818
      amdocl64!clIcdGetPlatformIDsKHR+0x4aa46
      amdocl64!clIcdGetPlatformIDsKHR+0x4db41
      kernel32!BaseThreadInitThunk+0xd
      ntdll!RtlUserThreadStart+0x1d

      Attempt to read from address 00000000002cfff8

      FAULTING_THREAD:  0000000000000850

      DEFAULT_BUCKET_ID:  INVALID_POINTER_READ

      I assume a bug either already corrupted the memory, or I pass bad parameters. To verify that, I enabled page heap (gflags /p /full /enable my.exe). But this results in an abort in the very first CL-call, clGetPlatformIDs(0, NULL, &numplatforms):

      atig6txx!XopOpenLinkedAdapter+0x139e
      atig6txx!SetThunkProxyBypassMode+0x1c97
      aticaldd64!calddiGetExport+0x17a420
      aticaldd64!calddiGetExport+0x17c0b8
      aticaldd64!calddiGetExport+0x47d98
      aticaldd64!calddiGetExport+0x1f4be
      aticaldd64!calddiGetExport+0x16bb9
      aticaldd64!calddiGetExport+0x1d9
      amdocl64!clIcdGetPlatformIDsKHR+0x17d80
      amdocl64!clIcdGetPlatformIDsKHR+0x1c2f9
      amdocl64!clIcdGetPlatformIDsKHR+0x484b
      amdocl64!clIcdGetPlatformIDsKHR+0x45a9b
      amdocl64+0x2017
      amdocl64!clGetPlatformInfo+0x4b
      amdocl64!clGetDeviceInfo+0xbe5
      OpenCL+0x1137
      OpenCL!clGetExtensionFunctionAddress+0x42e
      OpenCL+0x101c
      OpenCL!clGetPlatformIDs+0x1c
      mfacto!init_CL+0x44

      And strange again, on the Core i7-box with the nvidia-GPU, enabling page heap reveals no error. For a last test I enabled page heap for one of the sample programs, and it crashes the same way as my app. Therefore the conclusion is that on my AMD box, without page heap, clGetPlatformIDs already corrupts the heap, just the crash happens later.

      3. Is it possible to get the debug symbols for the OpenCL dlls so that the stack traces are more meaningful? Maybe there is some symbol server available?

      4. Is this bug in clGetPlatformIDs already known (if it is there)? Is there some fix or workaround available (I updated the drivers, Windows tells its version 8.850.0.0, OpenCL says driver version: CAL 1.4.1385 (VM))? If not, where could I file a bug report?

       

        • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
          himanshu.gautam

          If CPU is not shown it should be a SDK installation issue. The programs failing on CLGetPlatformId also indicate the same.

          I hope you are using SDK 2.4.

            • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
              Bdot

              Yes, it's 2.4. I now deinstalled all ATI/AMD software, rebooted, reinstalled.

              No change. Still no CPU, and still the abort when enabling page heap.

              However, I found the bug in my code that made me attempt page heap in the first place. (clCreateBuffer bigger than what I allocated and ..USE_HOST_POINTER ...) With that fixed my program now runs on the GPU (as long as I don't use page heap ).

              This makes the other two issues less important, yet they persist. Anyone an idea how to get the CPU into the OpenCL devices? Or what to check?

              And the pageheap-abort is certainly a memory-handling bug.  atig6txx!XopOpenLinkedAdapter should not read memory that does not belong to the application.

               

                • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
                  himanshu.gautam

                  Bdot,

                  Okay, I will report it to concerned persons.

                  Can you give me some more details about your system. What instruction sets are supported by your AMD CPU, its model number, and operating system you are using.

                  Thanks

                    • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
                      Bdot

                      Hi Himanshu,

                      thanks for your help. My machine is an AMD Phenom(tm) II X4 955 Processor 3.2GHz, running Windows7 64-bit. CPU features: Prefetch, 3DNow!, MMX, SSE, SSE2, caches: 64kB-512kB-6MB

                      GPU is a ATI HD 5750 (Juniper) -  device (and driver) version   OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) (CAL 1.4.1353 (VM))

                      Some time I'll try this on Linux as I have an OpenSuSE 11.4 partition there as well ... lets see what Linux on the same box will say.

                      Bdot

                       

                        • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
                          himanshu.gautam

                          Bdot,

                          Thanks for the information.

                          It would be nice if you can post your clinfo output also. (only showing GPU)

                            • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
                              Bdot

                              Interesting.

                              clinfo shows both the GPU and the CPU device. I need to check my code why I see only the GPU ...

                               

                              And clinfo does not like running in parallel to my OpenCL program - the machine reliably locks up when clinfo exits, only hard reset helps then.

                              Anyway, here´s the GPU part of the clinfo output:

                               

                              Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 10 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 850Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 00000000015CD118 Name: Juniper Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.1353 (VM) Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing

                                • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
                                  himanshu.gautam

                                  Bdot,

                                  Now that is confusing. Also I also work with juniper on an AMD Athlon machine, and it works good here. We also have a phenom machine but i don't remember its model.

                                  Although I think that your CPU is a bit old , but as SSE2 is supported it should work. So here is the current conclusion:

                                  clInfo shows both CPU & GPU but hangs the system after that.

                                  your program crashes on CPU, and works on GPU.

                                  I will report it to AMD developers. If you can provide some code that crashes the CPU it would be nice to post here. You can check if other samples are working on CPU.

                                    • Abort in atig6txx!XopOpenLinkedAdapter+0x139e
                                      Bdot

                                      Sorry for the confusion. Now both CPU and GPU devices are available, that was my mistake. What remains is the abort mentioned in the thread title:

                                      • only on GPU, not on CPU
                                      • only if pageheap is enabled
                                      • reproducible with the examples

                                      To reproduce:

                                      1. download and install Debugging Tools for Windows from MS
                                      2. F:\software\AMD APP\samples\opencl\bin\x86_64>"c:\Program Files\Debugging Tools for Windows (x64)\gflags.exe" /p /enable TemplateC.exe /full
                                        path: SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options
                                            templatec.exe: page heap enabled
                                      3. Run TemplateC.exe - it will crash after listing the input values. Call stack:
                                        atig6txx!XopOpenLinkedAdapter+0x139e
                                        atig6txx!SetThunkProxyBypassMode+0x1c97
                                        aticaldd64!calddiGetExport+0x17a420
                                        aticaldd64!calddiGetExport+0x17c0b8
                                        aticaldd64!calddiGetExport+0x47d98
                                        aticaldd64!calddiGetExport+0x1f4be
                                        aticaldd64!calddiGetExport+0x16bb9
                                        aticaldd64!calddiGetExport+0x1d9
                                        amdocl64!clIcdGetPlatformIDsKHR+0x17d80
                                        amdocl64!clIcdGetPlatformIDsKHR+0x1c2f9
                                        amdocl64!clIcdGetPlatformIDsKHR+0x484b
                                        amdocl64!clIcdGetPlatformIDsKHR+0x45a9b
                                        amdocl64+0x2017
                                        amdocl64!clGetPlatformInfo+0x4b
                                        amdocl64!clGetDeviceInfo+0xbe5
                                        OpenCL+0x1137
                                        OpenCL!clGetExtensionFunctionAddress+0x42e
                                        OpenCL+0x101c
                                        OpenCL!clGetPlatformIDs+0x1c