I have an application that is using GPU to accelerate decoding of video content and read the decoded YUV data for further processing. It uses the relatively new set of DirectX 11.1 video decoding API (ID3D11VideoDevice etc). The API works at both feature level 11.1 (windows 8 and 8.1) and feature level 9.3 (windows 7 and 2008r2 with ie11 update). However, while it works pretty well at feature level 11.1 which takes 1ms per frame, it's extremely slow on feature level 9.3, which takes 200ms per frame, a 200x factor! I have created a very small test program that well demonstrates the bug. This test program is simple yet the result is very persuasive:
1. Create a NV12 texture using D3D11_USAGE_DEFAULT
2. Create a NV12 texture using D3D11_USAGE_STAGING
3. CopySubresourceRegion to copy from 1 to 2 (yes it doesn't matter what's inside 1)
4. Map the staging texture
5. Copy the mapped memory from mapped staging to system memory
The test program profiles the time it takes (in microseconds) on step 5 only so the pipeline stall doesn't matter. Run it on Windows 8/8.1 and Windows 7 respectively and you will immediately see the performance bug. Strangely this happens to video format only. Changing the format to RGB in the same program will not reproduce the extremely slow latency. Also, the same program doesn't have such problem on GPUs of different brands.
Please help to investigate and thanks in advance.