cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ozdelite
Journeyman III

AMD-only, GPU-only buffer overrun in OpenCL driver

Hello,

By using Microsoft's Application Verifier I've detected two problems which only occur when using AMD devices in an OpenCL application (these do not occur when using NVidia or Intel OpenCL devices). Furthermore, one of the issues on occurs when using a GPU device, as opposed to CPU.

Here is info on the configuration I'm testing with (this is in a machine with multiple GPUs, one from NVidia driving the display #2, and the other from AMD (Radeon HD 6900) driving display #1 and being used as the compute device.

  CL_PLATFORM_NAME: AMD Accelerated Parallel Processing

  CL_PLATFORM_VENDOR: Advanced Micro Devices, Inc.

  CL_PLATFORM_VERSION: OpenCL 1.1 AMD-APP (831.4)

  CL_DEVICE_NAME: Cayman

  CL_DEVICE_VENDOR: Advanced Micro Devices, Inc.

  CL_DRIVER_VERSION: CAL 1.4.1646 (VM)

To reproduce these issues yourself, please do the following:

They can be reproduced in an application that simply builds and enqueues an empty kernel, using the latest AMD APP SDK (2.6)

  • Build the attached application. I've included a project file for Visual Studio 2010, but the source file should build on any system with minor changes.
  • Download and install the Microsoft Application Verifier, which is available as part of the "Microsoft Windows SDK for Windows 7 and .NET Framework 4" (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=8279).
  • Run Application Verifier and
    • add the executable for the attached repro program and configure the following Tests (a simple checkbox):
      • Basics->Heaps must be enabled to reproduce issue #1 described below
      • Basics->TLS must be enabled to reproduce issue #2 described below
    • Click "save"
    • Run the application in a debugger.

Issue #1 Access violation (possible buffer overrun?)

This only occurs when using an AMD device, and only when using a device of type CL_DEVICE_TYPE_GPU. If I create a

CL_DEVICE_TYPE_CPU, or using another vendor's device, it does not occur.

When Application Verifier is enabled with Heap testing, it puts each allocation in its own page with guards before and after the allocation, and aligns the allocation to the end of the page. In the attached sample application, when the kernel finishes executing the application verifier will break the debugger with an access violation; usually before clFinish() exits, or occasionally right after.  It outputs the following information to the output window:

===========================================================

VERIFIER STOP 0000000000000013: pid 0x1118: first chance access violation for current stack trace

    00000000184F8000 : Invalid address being accessed

    0000000069038331 : Code performing invalid access

    0000000006FBEBD0 : Exception record. Use .exr to display it.

    0000000006FBE6E0 : Context record. Use .cxr to display it.

===========================================================

This verifier stop is continuable.

After debugging it use `go' to continue.

===========================================================

the address being accessed ends in 0x8000 which seems to indicate a buffer overrun since it's at the start of a new page, and all allocations are aligned to the end of a page.

Issue #2 Access violation (possible buffer overrun?)

This is also does not occur when using an OpenCL driver from another vendor.

This issue is identified when TLS tests are enabled for the application in Application Verifier. Upon launch of the application, during most OpenCL calls, the following exception is thrown in the debugger by the verifier:

=======================================

VERIFIER STOP 00000301: pid 0x7C8: Invalid TLS index used for current stack trace.

    00000000 : Invalid TLS index.

    0000ABBA : Expected lower part of the index.

    00000000 : Not used.

    00000000 : Not used.

=======================================

This verifier stop is continuable.

After debugging it use `go' to continue.

=======================================

According to this article (http://msdn.microsoft.com/en-us/library/windows/desktop/ms686997%28v=vs.85%29.aspx) this indicates incorrect implementation of a DLL entry point, leading to invalid TLS indices being used.

0 Likes
0 Replies