cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

madshi
Journeyman III

OpenCL 1.2 "cl_khr_dx9_media_sharing" bugs

Hey guys,

trying to make the "cl_khr_dx9_media_sharing" OpenCL 1.2 extension work. The latest AMD drivers (12.11 beta) generally support this extension and it works ok, but there are a couple of bugs:

(1) When first calling clCreateFromDX9MediaSurfaceKHR() in thread "A" and then later calling clReleaseMemObject() in thread "B", there's a crash. When calling both in the same thread, there's no problem. Since calling both in the same thread is easy enough, this is not very important to me. Generally my impression is that AMD's OpenCL implementation is not very thread safe. But I can live with that for now.

(2) Show stopper: Sometimes clCreateFromDX9MediaSurfaceKHR() fails with error code -1022. This happens in the following situation:

- IDirectXVideoProcessorService.CreateSurface(NV12) succeeds -> (IDirect3D9Surface) 0x44440000

- clCreateFromDX9MediaSurfaceKHR() succeeds

- clEnqueueAcquireDX9MediaSurfacesKHR() succeeds

- clEnqueueNDRangeKernel() succeeds

- clEnqueueReleaseDX9MediaSurfacesKHR() succeeds

- clReleaseMemObject() succeeds

- ((IDirect3D9Surface) 0x44440000)->Release() succeeds

- IDirectXVideoProcessorService.CreateSurface(NV12) succeeds -> (IDirect3D9Surface) 0x44440000

- clCreateFromDX9MediaSurfaceKHR() fails with -1022

This problem occurs only if a newly created IDirect3D9Surface by random chance gets the exact same address of an older (already released) IDirect3D9Surface, which was already used by OpenCL. I've double checked the reference counts of all involved interfaces/objects and I'm definitely properly releasing everything. So this seems like a bug in the implementation of the "cl_khr_dx9_media_sharing" extension.

Is this forum the right place to post such bug reports? Or is there a bug reporting system set up somewhere for developers?

Thanks!

0 Likes
16 Replies
binying
Challenger

what hardware/operating system are you using?

0 Likes

Ah sorry, should have said that:

Windows 7 x64 SP1 (Home Premium), Intel Core2 Duo T7200, AMD 7770, Catalyst 12.11 beta

0 Likes

Is this forum the right place to post such bug reports? Yes, it is one of the right places.

Is it possible for you to upload a simple test case?

0 Likes

I've uploaded a simple demo project with source code and compiled exe here:

http://madshi.net/OpenClBugs.rar

This project reproduces those 2 bugs I mentioned. Plus, it also contains a speed measurement, because I've found that the cl_khr_dx9_media_sharing extension is currently extremely slow. Here are the test results on my development PC:

1920x1080 8bit RGB image (Blu-Ray resolution)

(1) StretchRect: 3015 fps

(2) HLSL PixelShader pass: 2209 fps

(3) OpenCL GPU kernel: read/write image created with clCreateImage: 1686 fps

(4) same as (3), plus clEnqueueCopyImage to IDirect3DSurface9: 131 fps

(5) same as (4), plus StretchRect to IDirect3DTexture9: 92 fps

(6) OpenCL GPU kernel: read/write directly from/to IDirect3DSurface9: 120 fps

(7) same as (6), plus StretchRect to IDirect3DTexture9: 85 fps

The main purpose of the "cl_khr_dx9_media_sharing" extension is to allow us developers to integrate OpenCL passes into our D3D9 rendering pipeline. Personally, I want to use this for my DirectShow video renderer. For this all to make any sense, the speed of a simple OpenCL GPU kernel should be roughly comparable to an HLSL pixel shader pass. If it's a tiny bit slower, I can live with that. But if you look at the numbers above, I can do HLSL pixel shader passes at 2209 frames per second with my Radeon 7770. But the simplest possible OpenCL GPU kernel currently only runs with 85 fps in my rendering pipeline, when using D3D9 interop. I hope this will be sped up to near HLSL speed? Otherwise it's not really useable in real life.

OpenCL initialization is also veeeery slow on my PC. It takes about 2 seconds (guessed, not measured). I'd really like users of my software to not have to wait 2 seconds before they're able to play a video. Hopefully this can be improved, too?

Thanks!

0 Likes

Let me find a 7770 first...

0 Likes

I am trying.

I modified a few paths of the solution. It failed to build. The error message is

"

1>  All outputs are up-to-date.

1>OpenClBugs.obj : error LNK2019: unresolved external symbol _D3DXCompileShader@40 referenced in function "struct IDirect3DPixelShader9 * __cdecl CompilePixelShader(char const *)" (?CompilePixelShader@@YAPAUIDirect3DPixelShader9@@PBD@Z)

1>C:\Users\binying\Downloads\OpenClBugs\OpenClBugs\Debug\OpenClBugs.exe : fatal error LNK1120: 1 unresolved externals

1>

1>Build FAILED.

1>

1>Time Elapsed 00:00:00.15

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped =========="

0 Likes

You need to link to "d3dx9.lib" from the MS D3D9 SDK. I've uploaded the file here for your convenience:

http://madshi.net/d3dx9.rar

0 Likes

you are right. It compiles well this time. I have two Gpu's in my desktop. Can your program can select the right GPU (79XX in this case) and run?

I am asking this is because the program crashes if I click the "cancel" icon for the Speed measurement. 

0 Likes

I'm glad to hear that. So I'm curious: Can you reproduce the 2 bugs and the performance issue? Do you see any hope to get all this fixed?

Thanks!

0 Likes

When I click "no" for bug2, an error window pops up, saying " mapping surface failed on loop pass 1". 

When I click "Yes for bug1, the program runs without any output.  Is this what you expected?

Again, I have two Gpu's in my desktop. Can your program can select the right GPU (79XX in this case) and run?

0 Likes

The result for "no"/bug2 is as expected. The same thing happens on my PC. And if you look at the source code, there's no reason for the loop to fail. It should run through without problems, but it does not. Which shows that there's a bug in the AMD OpenCL implementation, as far as I understand.

The result for "yes"/bug1 is unexpected. On my PC I get a crash. Every time. 100% reproduceable.

What does the speed test say? Can you reproduce OpenCL being very slow compared to HLSL pixel shaders?

The test program uses "GetDesktopWindow()" as the Direct3D focus window. So whatever GPU is responsible for the desktop window will be the one the program uses. I've not tested which monitor "GetDesktopWindow()" results in, but I'd guess it's probably always the primary monitor. If you want to test a different GPU/monitor, it should probably not be hard for you to change the code. Just search for "CreateDeviceEx" and change it according to your needs.

0 Likes

"The test program uses "GetDesktopWindow()" as the Direct3D focus window..."

    So the program selects the correct GPU in my PC (7970, instead of 7770 , which you used), especially you use  "if (strstr(extensions, "cl_khr_dx9_media_sharing"))" .  The other GPU in my PC does not support the extension. 

What does the speed test say? “

     As for the speed test, it crashes.

0 Likes

Ok, so you were able to reproduce one of the 2 bugs, but not the other one. Are you working for AMD, btw? If so, is there a chance to get the one bug you can reproduce fixed?

I've read about an "OpenCL ticket" in another thread. Is there a way to create an official OpenCL support ticket? Do you know where/how? I can't seem to find it. Or it this thread good enough?

Would it be hard for you to check where the speed test crashes exactly? I wish I could debug that myself, but the speed test runs fine for me. You do have D3D9 updated on your PC, don't you? The speed test uses one of the d3d9x DLLs that are not installed by default on a win7 PC. You first have to download the D3D9 web installer from Microsoft and update D3D9 to the latest build. If you don't want to do that, you could just comment out the HLSL speed test and just run the other speed tests, only.

Thanks.

0 Likes

I've already used the latest build of D3D9. So I don't understand why I cannot reproduce the bugs.  The speed measurement crashes at the first for loop instead of the HLSL part.

void MeasureSpeed()

{...

OpenCL.CompileKernel();

  for (int i1 = 0; i1 < 100; i1++)

    time1 += OpenCL.WriteToRgbSurface(rgbTex);

...

}

Which is also unexpected, right?

I am not from AMD. But it's my pleasure to forward your message to them. You could also write emails to DevCentralSupport@amd.com

0 Likes

Message sent to AMD.

And the feedback is,

"Thank them for the post, tell them that the information has been passed to engineering."

0 Likes

Thank you, I appreciate your help.

I'm not sure why the first for loop in the speed test crashes. I don't think it should, so maybe that's another OpenCL bug... 😉   Or maybe a bug in my demo, but it works on my PC...

> So I don't understand why I cannot reproduce the bugs.

Well, at least you could reproduce one of the bugs (when pressing "no" for bug2 you get the same error as I do on my PC).

0 Likes