4 Replies Latest reply on Nov 9, 2012 8:48 AM by zadig

    [solved] GPU usage is  0% during hardware decoding of h264 video.

    wl2776
      Where could be the problem?

      Hi all.

      I've recently started learning hardware accelerated video decoding and cannot find an error in my code.

      My test program reads h264 video frames from the mov container, decodes them and then drops.
      It uses FFmpeg to parse the file. I've also copied some code from the VLC player to set up HW acceleration.

      The problem is that it seems the hardware acceleration doesn't work, although calls to DXVA API can be seen in DXVA checker.

      The Process Explorer application shows that, compared to pure software decoding, the CPU load is the same.
      The only difference between software decoding and my "hw accelerated" decoding is that CPU spends some time in
      kernel mode (process explorer shows red line on the CPU load graph).
      When I switch off my "hw accelerated" decoding, the CPU spends all its time in a user mode.

      GPU load is always 0% both with and without "hw acceleration", however, video memory is used, when hw acceleration is on.
      DXVA Checker also shows calls to DXVA2 functions (CreateDevice, BeginFrame, GetBuffer, DecodeDeviceExecute, EndFrame, etc).

      I use ATI Radeon HD 4550, my OS is Windows 7, I've installed the latest drivers from the AMD site.
      My code is here: http://pastebin.com/UsgggaEN
      I use version git-d049257 of FFmpeg libraries, downloaded from git://git.videolan.org/ffmpeg.git, compiled with gcc 4.6.1

      #include <cstdio> #define COBJMACROS #include <windows.h> #include <d3d9.h> #include <dxva2api.h> extern "C" { #include <libavformat/avformat.h> #include <libavcodec/avcodec.h> #include <libavcodec/dxva2.h> } #define VA_DXVA2_MAX_SURFACE_COUNT (64) typedef struct { LPDIRECT3DSURFACE9 d3d; int refcount; unsigned int order; } vlc_va_surface_t; typedef struct { int codec_id; int width; int height; /* DLL */ HINSTANCE hd3d9_dll; HINSTANCE hdxva2_dll; /* Direct3D */ D3DPRESENT_PARAMETERS d3dpp; LPDIRECT3D9 d3dobj; D3DADAPTER_IDENTIFIER9 d3dai; LPDIRECT3DDEVICE9 d3ddev; /* Device manager */ UINT token; IDirect3DDeviceManager9 *devmng; HANDLE device; /* Video service */ IDirectXVideoDecoderService *vs; GUID input; D3DFORMAT render; /* Video decoder */ DXVA2_ConfigPictureDecode cfg; IDirectXVideoDecoder *decoder; /* Option conversion */ D3DFORMAT output; /* */ struct dxva_context hw; /* */ unsigned surface_count; unsigned surface_order; int surface_width; int surface_height; vlc_va_surface_t surface[VA_DXVA2_MAX_SURFACE_COUNT]; LPDIRECT3DSURFACE9 hw_surface[VA_DXVA2_MAX_SURFACE_COUNT]; } vlc_va_dxva2_t; typedef struct { const char *name; D3DFORMAT format; PixelFormat codec; } d3d_format_t; /* XXX Prefered format must come first */ static const d3d_format_t d3d_formats[] = { { "YV12", (D3DFORMAT)MAKEFOURCC('Y','V','1','2'), PIX_FMT_YUV420P }, { "NV12", (D3DFORMAT)MAKEFOURCC('N','V','1','2'), PIX_FMT_NV12 }, { NULL, (D3DFORMAT)0, PIX_FMT_NONE } }; /** * video format description */ struct video_format_t { PixelFormat i_chroma; /**< picture chroma */ unsigned int i_width; /**< picture width */ unsigned int i_height; /**< picture height */ unsigned int i_x_offset; /**< start offset of visible area */ unsigned int i_y_offset; /**< start offset of visible area */ unsigned int i_visible_width; /**< width of visible area */ unsigned int i_visible_height; /**< height of visible area */ unsigned int i_bits_per_pixel; /**< number of bits per pixel */ unsigned int i_sar_num; /**< sample/pixel aspect ratio */ unsigned int i_sar_den; unsigned int i_frame_rate; /**< frame rate numerator */ unsigned int i_frame_rate_base; /**< frame rate denominator */ uint32_t i_rmask, i_gmask, i_bmask; /**< color masks for RGB chroma */ int i_rrshift, i_lrshift; int i_rgshift, i_lgshift; int i_rbshift, i_lbshift; }; static const d3d_format_t *D3dFindFormat(D3DFORMAT format) { for (unsigned i = 0; d3d_formats[i].name; i++) { if (d3d_formats[i].format == format) return &d3d_formats[i]; } return NULL; } static const GUID DXVA2_ModeMPEG2_MoComp = { 0xe6a9f44b, 0x61b0,0x4563, {0x9e,0xa4,0x63,0xd2,0xa3,0xc6,0xfe,0x66} }; static const GUID DXVA2_ModeMPEG2_IDCT = { 0xbf22ad00, 0x03ea,0x4690, {0x80,0x77,0x47,0x33,0x46,0x20,0x9b,0x7e} }; static const GUID DXVA2_ModeMPEG2_VLD = { 0xee27417f, 0x5e28,0x4e65, {0xbe,0xea,0x1d,0x26,0xb5,0x08,0xad,0xc9} }; static const GUID DXVA2_ModeMPEG2and1_VLD = { 0x86695f12, 0x340e,0x4f04, {0x9f,0xd3,0x92,0x53,0xdd,0x32,0x74,0x60} }; static const GUID DXVA2_ModeMPEG1_VLD = { 0x6f3ec719, 0x3735,0x42cc, {0x80,0x63,0x65,0xcc,0x3c,0xb3,0x66,0x16} }; static const GUID DXVA2_ModeH264_A = { 0x1b81be64, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeH264_B = { 0x1b81be65, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeH264_C = { 0x1b81be66, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeH264_D = { 0x1b81be67, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeH264_E = { 0x1b81be68, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeH264_F = { 0x1b81be69, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA_ModeH264_VLD_WithFMOASO_NoFGT = { 0xd5f04ff9, 0x3418,0x45d8, {0x95,0x61,0x32,0xa7,0x6a,0xae,0x2d,0xdd} }; static const GUID DXVADDI_Intel_ModeH264_A = { 0x604F8E64, 0x4951,0x4c54, {0x88,0xFE,0xAB,0xD2,0x5C,0x15,0xB3,0xD6} }; static const GUID DXVADDI_Intel_ModeH264_C = { 0x604F8E66, 0x4951,0x4c54, {0x88,0xFE,0xAB,0xD2,0x5C,0x15,0xB3,0xD6} }; static const GUID DXVADDI_Intel_ModeH264_E = { // DXVA_Intel_H264_ClearVideo 0x604F8E68, 0x4951,0x4c54, {0x88,0xFE,0xAB,0xD2,0x5C,0x15,0xB3,0xD6} }; static const GUID DXVA2_ModeWMV8_A = { 0x1b81be80, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeWMV8_B = { 0x1b81be81, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeWMV9_A = { 0x1b81be90, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeWMV9_B = { 0x1b81be91, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeWMV9_C = { 0x1b81be94, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeVC1_A = { 0x1b81beA0, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeVC1_B = { 0x1b81beA1, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeVC1_C = { 0x1b81beA2, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA2_ModeVC1_D = { 0x1b81beA3, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; /* Conformity to the August 2010 update of the specification, ModeVC1_VLD2010 */ static const GUID DXVA2_ModeVC1_D2010 = { 0x1b81beA4, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA_NoEncrypt = { 0x1b81bed0, 0xa0c7,0x11d3, {0xb9,0x84,0x00,0xc0,0x4f,0x2e,0x73,0xc5} }; static const GUID DXVA_Intel_VC1_ClearVideo = { 0xBCC5DB6D, 0xA2B6,0x4AF0, {0xAC,0xE4,0xAD,0xB1,0xF7,0x87,0xBC,0x89} }; static const GUID DXVA_nVidia_MPEG4_ASP = { 0x9947EC6F, 0x689B,0x11DC, {0xA3,0x20,0x00,0x19,0xDB,0xBC,0x41,0x84} }; static const GUID DXVA_ModeMPEG4pt2_VLD_Simple = { 0xefd64d74, 0xc9e8,0x41d7, {0xa5,0xe9,0xe9,0xb0,0xe3,0x9f,0xa3,0x19} }; static const GUID DXVA_ModeMPEG4pt2_VLD_AdvSimple_NoGMC = { 0xed418a9f, 0x10d,0x4eda, {0x9a,0xe3,0x9a,0x65,0x35,0x8d,0x8d,0x2e} }; static const GUID DXVA_ModeMPEG4pt2_VLD_AdvSimple_GMC = { 0xab998b5b, 0x4258,0x44a9, {0x9f,0xeb,0x94,0xe5,0x97,0xa6,0xba,0xae} }; static const GUID IID_IDirectXVideoDecoderService = { 0xfc51a551, 0xd5e7, 0x11d9, {0xaf,0x55,0x00,0x05,0x4e,0x43,0xff,0x02} }; static const GUID IID_IDirectXVideoAccelerationService = { 0xfc51a550, 0xd5e7, 0x11d9, {0xaf,0x55,0x00,0x05,0x4e,0x43,0xff,0x02} }; typedef struct { const char *name; const GUID& guid; int codec; } dxva2_mode_t; /* XXX Prefered modes must come first */ static const dxva2_mode_t dxva2_modes[] = { { "MPEG-2 variable-length decoder", DXVA2_ModeMPEG2_VLD, CODEC_ID_MPEG2VIDEO }, { "MPEG-2 & MPEG-1 variable-length decoder", DXVA2_ModeMPEG2and1_VLD, CODEC_ID_MPEG2VIDEO }, { "MPEG-2 motion compensation", DXVA2_ModeMPEG2_MoComp, 0 }, { "MPEG-2 inverse discrete cosine transform", DXVA2_ModeMPEG2_IDCT, 0 }, { "MPEG-1 variable-length decoder", DXVA2_ModeMPEG1_VLD, 0 }, { "H.264 variable-length decoder, film grain technology", DXVA2_ModeH264_F, CODEC_ID_H264 }, { "H.264 variable-length decoder, no film grain technology", DXVA2_ModeH264_E, CODEC_ID_H264 }, { "H.264 variable-length decoder, no film grain technology (Intel ClearVideo)",DXVADDI_Intel_ModeH264_E, CODEC_ID_H264 }, { "H.264 variable-length decoder, no film grain technology, FMO/ASO", DXVA_ModeH264_VLD_WithFMOASO_NoFGT, CODEC_ID_H264 }, { "H.264 inverse discrete cosine transform, film grain technology", DXVA2_ModeH264_D, 0 }, { "H.264 inverse discrete cosine transform, no film grain technology", DXVA2_ModeH264_C, 0 }, { "H.264 inverse discrete cosine transform, no film grain technology (Intel)", DXVADDI_Intel_ModeH264_C, 0 }, { "H.264 motion compensation, film grain technology", DXVA2_ModeH264_B, 0 }, { "H.264 motion compensation, no film grain technology", DXVA2_ModeH264_A, 0 }, { "H.264 motion compensation, no film grain technology (Intel)", DXVADDI_Intel_ModeH264_A, 0 }, { "Windows Media Video 8 motion compensation", DXVA2_ModeWMV8_B, 0 }, { "Windows Media Video 8 post processing", DXVA2_ModeWMV8_A, 0 }, { "Windows Media Video 9 IDCT", DXVA2_ModeWMV9_C, 0 }, { "Windows Media Video 9 motion compensation", DXVA2_ModeWMV9_B, 0 }, { "Windows Media Video 9 post processing", DXVA2_ModeWMV9_A, 0 }, { "VC-1 variable-length decoder", DXVA2_ModeVC1_D, CODEC_ID_VC1 }, { "VC-1 variable-length decoder", DXVA2_ModeVC1_D, CODEC_ID_WMV3 }, { "VC-1 variable-length decoder", DXVA2_ModeVC1_D2010, CODEC_ID_VC1 }, { "VC-1 variable-length decoder", DXVA2_ModeVC1_D2010, CODEC_ID_WMV3 }, { "VC-1 inverse discrete cosine transform", DXVA2_ModeVC1_C, 0 }, { "VC-1 motion compensation", DXVA2_ModeVC1_B, 0 }, { "VC-1 post processing", DXVA2_ModeVC1_A, 0 }, { "VC-1 variable-length decoder (Intel)", DXVA_Intel_VC1_ClearVideo, 0 }, { "MPEG-4 Part 2 nVidia bitstream decoder", DXVA_nVidia_MPEG4_ASP, 0 }, { "MPEG-4 Part 2 variable-length decoder, Simple Profile", DXVA_ModeMPEG4pt2_VLD_Simple, 0 }, { "MPEG-4 Part 2 variable-length decoder, Simple&Advanced Profile, no global motion compensation", DXVA_ModeMPEG4pt2_VLD_AdvSimple_NoGMC, 0 }, { "MPEG-4 Part 2 variable-length decoder, Simple&Advanced Profile, global motion compensation", DXVA_ModeMPEG4pt2_VLD_AdvSimple_GMC, 0 }, { NULL, GUID_NULL, 0 } }; static const dxva2_mode_t *Dxva2FindMode(const GUID& guid) { for (unsigned i = 0; dxva2_modes[i].name; i++) { if (IsEqualGUID(dxva2_modes[i].guid, guid)) return &dxva2_modes[i]; } return NULL; } static int D3dCreateDevice(vlc_va_dxva2_t *); static void D3dDestroyDevice(vlc_va_dxva2_t *); //static char *DxDescribe(vlc_va_dxva2_t *); static int D3dCreateDeviceManager(vlc_va_dxva2_t *); static void D3dDestroyDeviceManager(vlc_va_dxva2_t *); static int DxCreateVideoService(vlc_va_dxva2_t *); static void DxDestroyVideoService(vlc_va_dxva2_t *); static int DxFindVideoServiceConversion(vlc_va_dxva2_t *, GUID *input, D3DFORMAT *output); static int DxCreateVideoDecoder(vlc_va_dxva2_t *, int codec_id, const video_format_t *fmt); static void DxDestroyVideoDecoder(vlc_va_dxva2_t *); static int DxResetVideoDecoder(vlc_va_dxva2_t *); static void DxCreateVideoConversion(vlc_va_dxva2_t *); static void DxDestroyVideoConversion(vlc_va_dxva2_t *); static void Close(vlc_va_dxva2_t *external) { vlc_va_dxva2_t *va = external; DxDestroyVideoConversion(va); DxDestroyVideoDecoder(va); DxDestroyVideoService(va); D3dDestroyDeviceManager(va); D3dDestroyDevice(va); if (va->hdxva2_dll) FreeLibrary(va->hdxva2_dll); if (va->hd3d9_dll) FreeLibrary(va->hd3d9_dll); free(va); } vlc_va_dxva2_t *vlc_va_NewDxva2(int codec_id) { vlc_va_dxva2_t *va = (vlc_va_dxva2_t *)calloc(1, sizeof(*va)); if (!va) return NULL; va->codec_id = codec_id; /* Load dll*/ va->hd3d9_dll = LoadLibrary(TEXT("D3D9.DLL")); if (!va->hd3d9_dll) { av_log(NULL, AV_LOG_ERROR, "cannot load d3d9.dll\n"); goto error; } va->hdxva2_dll = LoadLibrary(TEXT("DXVA2.DLL")); if (!va->hdxva2_dll) { av_log(NULL, AV_LOG_ERROR, "cannot load dxva2.dll\n"); goto error; } av_log(NULL, AV_LOG_INFO, "DLLs loaded\n"); /* */ if (D3dCreateDevice(va)) { av_log(NULL, AV_LOG_ERROR, "Failed to create Direct3D device\n"); goto error; } av_log(NULL, AV_LOG_INFO, "D3dCreateDevice succeed\n"); if (D3dCreateDeviceManager(va)) { av_log(NULL, AV_LOG_ERROR, "D3dCreateDeviceManager failed\n"); goto error; } if (DxCreateVideoService(va)) { av_log(NULL, AV_LOG_ERROR, "DxCreateVideoService failed\n"); goto error; } /* */ if (DxFindVideoServiceConversion(va, &va->input, &va->render)) { av_log(NULL, AV_LOG_ERROR, "DxFindVideoServiceConversion failed\n"); goto error; } /* TODO print the hardware name/vendor for debugging purposes */ return va; error: Close(va); return NULL; } /** * It creates a Direct3D device usable for DXVA 2 */ static int D3dCreateDevice(vlc_va_dxva2_t *va) { /* */ typedef LPDIRECT3D9 (WINAPI *Create9func)(UINT SDKVersion); Create9func Create9 = (Create9func )GetProcAddress(va->hd3d9_dll, TEXT("Direct3DCreate9")); if (!Create9) { av_log(NULL, AV_LOG_ERROR, "Cannot locate reference to Direct3DCreate9 ABI in DLL"); return -1; } /* */ LPDIRECT3D9 d3dobj; d3dobj = Create9(D3D_SDK_VERSION); if (!d3dobj) { av_log(NULL, AV_LOG_ERROR, "Direct3DCreate9 failed"); return -1; } va->d3dobj = d3dobj; /* */ D3DADAPTER_IDENTIFIER9 *d3dai = &va->d3dai; if (FAILED(IDirect3D9_GetAdapterIdentifier(va->d3dobj, D3DADAPTER_DEFAULT, 0, d3dai))) { av_log(NULL, AV_LOG_WARNING, "IDirect3D9_GetAdapterIdentifier failed"); ZeroMemory(d3dai, sizeof(*d3dai)); } /* */ D3DPRESENT_PARAMETERS *d3dpp = &va->d3dpp; ZeroMemory(d3dpp, sizeof(*d3dpp)); d3dpp->Flags = D3DPRESENTFLAG_VIDEO; d3dpp->Windowed = TRUE; d3dpp->hDeviceWindow = NULL; d3dpp->SwapEffect = D3DSWAPEFFECT_DISCARD; d3dpp->MultiSampleType = D3DMULTISAMPLE_NONE; d3dpp->PresentationInterval = D3DPRESENT_INTERVAL_DEFAULT; d3dpp->BackBufferCount = 0; /* FIXME what to put here */ d3dpp->BackBufferFormat = D3DFMT_X8R8G8B8; /* FIXME what to put here */ d3dpp->BackBufferWidth = 0; d3dpp->BackBufferHeight = 0; d3dpp->EnableAutoDepthStencil = FALSE; /* Direct3D needs a HWND to create a device, even without using ::Present this HWND is used to alert Direct3D when there's a change of focus window. For now, use GetShellWindow, as it looks harmless */ LPDIRECT3DDEVICE9 d3ddev; if (FAILED(IDirect3D9_CreateDevice(d3dobj, D3DADAPTER_DEFAULT, D3DDEVTYPE_HAL, GetShellWindow(), D3DCREATE_SOFTWARE_VERTEXPROCESSING | D3DCREATE_MULTITHREADED, d3dpp, &d3ddev))) { av_log(NULL, AV_LOG_ERROR, "IDirect3D9_CreateDevice failed\n"); return -1; } va->d3ddev = d3ddev; return 0; } /** * It releases a Direct3D device and its resources. */ static void D3dDestroyDevice(vlc_va_dxva2_t *va) { if (va->d3ddev) IDirect3DDevice9_Release(va->d3ddev); if (va->d3dobj) IDirect3D9_Release(va->d3dobj); } /** * It describes our Direct3D object static char *DxDescribe(vlc_va_dxva2_t *va) { static const struct { unsigned id; char name[32]; } vendors [] = { { 0x1002, "ATI" }, { 0x10DE, "NVIDIA" }, { 0x8086, "Intel" }, { 0x5333, "S3 Graphics" }, { 0, "" } }; D3DADAPTER_IDENTIFIER9 *id = &va->d3dai; const char *vendor = "Unknown"; for (int i = 0; vendors[i].id != 0; i++) { if (vendors[i].id == id->VendorId) { vendor = vendors[i].name; break; } } char *description; if (asprintf(&description, "DXVA2 (%.*s, vendor %d(%s), device %d, revision %d)", sizeof(id->Description), id->Description, id->VendorId, vendor, id->DeviceId, id->Revision) < 0) return NULL; return description; } */ /** * It creates a Direct3D device manager */ static int D3dCreateDeviceManager(vlc_va_dxva2_t *va) { typedef HRESULT (WINAPI *CreateDeviceManager9_func)(UINT *pResetToken, IDirect3DDeviceManager9 **); CreateDeviceManager9_func CreateDeviceManager9 = (CreateDeviceManager9_func)GetProcAddress(va->hdxva2_dll, TEXT("DXVA2CreateDirect3DDeviceManager9")); if (!CreateDeviceManager9) { av_log(NULL, AV_LOG_ERROR, "cannot load function\n"); return -1; } av_log(NULL, AV_LOG_INFO, "OurDirect3DCreateDeviceManager9 Success!\n"); UINT token; IDirect3DDeviceManager9 *devmng; if (FAILED(CreateDeviceManager9(&token, &devmng))) { av_log(NULL, AV_LOG_ERROR, " OurDirect3DCreateDeviceManager9 failed\n"); return -1; } HRESULT hr = devmng->ResetDevice(va->d3ddev, token); if (FAILED(hr)) { av_log(NULL, AV_LOG_ERROR, "IDirect3DDeviceManager9_ResetDevice failed: %08x", (unsigned)hr); return -1; } devmng->AddRef(); va->token = token; va->devmng = devmng; av_log(NULL, AV_LOG_INFO, "obtained IDirect3DDeviceManager9\n"); return 0; } /** * It destroys a Direct3D device manager */ static void D3dDestroyDeviceManager(vlc_va_dxva2_t *va) { if (va->devmng) va->devmng->Release(); } /** * It creates a DirectX video service */ static int DxCreateVideoService(vlc_va_dxva2_t *va) { typedef HRESULT (WINAPI *CreateVideoService_func)(IDirect3DDevice9 *, REFIID riid, void **ppService); CreateVideoService_func CreateVideoService = (CreateVideoService_func)GetProcAddress(va->hdxva2_dll, TEXT("DXVA2CreateVideoService")); if (!CreateVideoService) { av_log(NULL, AV_LOG_ERROR, "cannot load function\n"); return 4; } av_log(NULL, AV_LOG_INFO, "DXVA2CreateVideoService Success!\n"); HRESULT hr; HANDLE device; hr = va->devmng->OpenDeviceHandle(&device); if (FAILED(hr)) { av_log(NULL, AV_LOG_ERROR, "OpenDeviceHandle failed\n"); return -1; } va->device = device; IDirectXVideoDecoderService *vs; hr = va->devmng->GetVideoService(device, IID_IDirectXVideoDecoderService, (void **)&vs); if (FAILED(hr)) { av_log(NULL, AV_LOG_ERROR, "GetVideoService failed\n"); return -1; } va->vs = vs; return 0; } /** * It destroys a DirectX video service */ static void DxDestroyVideoService(vlc_va_dxva2_t *va) { if (va->device) va->devmng->CloseDeviceHandle(va->device); if (va->vs) va->vs->Release(); } /** * Find the best suited decoder mode GUID and render format. */ static int DxFindVideoServiceConversion(vlc_va_dxva2_t *va, GUID *input, D3DFORMAT *output) { /* Retrieve supported modes from the decoder service */ UINT input_count = 0; GUID *input_list = NULL; if (FAILED(va->vs->GetDecoderDeviceGuids(&input_count, &input_list))) { av_log(NULL, AV_LOG_ERROR, "IDirectXVideoDecoderService_GetDecoderDeviceGuids failed\n"); return -1; } for (unsigned i = 0; i < input_count; i++) { const GUID &g = input_list[i]; const dxva2_mode_t *mode = Dxva2FindMode(g); if (mode) { av_log(NULL, AV_LOG_INFO, "- '%s' is supported by hardware\n", mode->name); } else { av_log(NULL, AV_LOG_WARNING, "- Unknown GUID = %08X-%04x-%04x-XXXX\n", (unsigned)g.Data1, g.Data2, g.Data3); } } /* Try all supported mode by our priority */ for (unsigned i = 0; dxva2_modes[i].name; i++) { const dxva2_mode_t *mode = &dxva2_modes[i]; if (!mode->codec || mode->codec != va->codec_id) continue; /* */ bool is_suported = false; for (unsigned count = 0; !is_suported && count < input_count; count++) { const GUID &g = input_list[count]; is_suported = IsEqualGUID(mode->guid, g) == 0; } if (!is_suported) continue; /* */ av_log(NULL, AV_LOG_DEBUG, "Trying to use '%s' as input\n", mode->name); UINT output_count = 0; D3DFORMAT *output_list = NULL; if (FAILED(va->vs->GetDecoderRenderTargets( mode->guid, &output_count, &output_list))) { av_log(NULL, AV_LOG_ERROR, "IDirectXVideoDecoderService_GetDecoderRenderTargets failed\n"); continue; } for (unsigned j = 0; j < output_count; j++) { const D3DFORMAT f = output_list[j]; const d3d_format_t *format = D3dFindFormat(f); if (format) { av_log(NULL, AV_LOG_DEBUG, "%s is supported for output\n", format->name); } else { av_log(NULL, AV_LOG_DEBUG, "%d is supported for output (%4.4s)\n", f, (const char*)&f); } } /* */ for (unsigned j = 0; d3d_formats[j].name; j++) { const d3d_format_t *format = &d3d_formats[j]; /* */ bool is_suported = false; for (unsigned k = 0; !is_suported && k < output_count; k++) { is_suported = format->format == output_list[k]; } if (!is_suported) continue; /* We have our solution */ av_log(NULL, AV_LOG_DEBUG, "Using '%s' to decode to '%s'\n", mode->name, format->name); *input = mode->guid; *output = format->format; CoTaskMemFree(output_list); CoTaskMemFree(input_list); return 0; } CoTaskMemFree(output_list); } CoTaskMemFree(input_list); return -1; } /** * It creates a DXVA2 decoder using the given video format */ static int DxCreateVideoDecoder(vlc_va_dxva2_t *va, int codec_id, const video_format_t *fmt) { /* */ av_log(NULL, AV_LOG_DEBUG, "DxCreateVideoDecoder id %d %dx%d\n", codec_id, fmt->i_width, fmt->i_height); va->width = fmt->i_width; va->height = fmt->i_height; /* Allocates all surfaces needed for the decoder */ va->surface_width = (fmt->i_width + 15) & ~15; va->surface_height = (fmt->i_height + 15) & ~15; switch (codec_id) { case CODEC_ID_H264: va->surface_count = 16 + 1; break; default: va->surface_count = 2 + 1; break; } LPDIRECT3DSURFACE9 surface_list[VA_DXVA2_MAX_SURFACE_COUNT]; if (FAILED(va->vs->CreateSurface(va->surface_width, va->surface_height, va->surface_count - 1, va->render, D3DPOOL_DEFAULT, 0, DXVA2_VideoDecoderRenderTarget, surface_list, NULL))) { av_log(NULL, AV_LOG_ERROR, "IDirectXVideoAccelerationService_CreateSurface failed\n"); va->surface_count = 0; return -1; } for (unsigned i = 0; i < va->surface_count; i++) { vlc_va_surface_t *surface = &va->surface[i]; surface->d3d = surface_list[i]; surface->refcount = 0; surface->order = 0; } av_log(NULL, AV_LOG_DEBUG, "IDirectXVideoAccelerationService_CreateSurface succeed with %d surfaces (%dx%d)\n", va->surface_count, fmt->i_width, fmt->i_height); /* */ DXVA2_VideoDesc dsc; ZeroMemory(&dsc, sizeof(dsc)); dsc.SampleWidth = fmt->i_width; dsc.SampleHeight = fmt->i_height; dsc.Format = va->render; if (fmt->i_frame_rate > 0 && fmt->i_frame_rate_base > 0) { dsc.InputSampleFreq.Numerator = fmt->i_frame_rate; dsc.InputSampleFreq.Denominator = fmt->i_frame_rate_base; } else { dsc.InputSampleFreq.Numerator = 0; dsc.InputSampleFreq.Denominator = 0; } dsc.OutputFrameFreq = dsc.InputSampleFreq; dsc.UABProtectionLevel = FALSE; dsc.Reserved = 0; /* FIXME I am unsure we can let unknown everywhere */ DXVA2_ExtendedFormat *ext = &dsc.SampleFormat; ext->SampleFormat = 0;//DXVA2_SampleUnknown; ext->VideoChromaSubsampling = 0;//DXVA2_VideoChromaSubsampling_Unknown; ext->NominalRange = 0;//DXVA2_NominalRange_Unknown; ext->VideoTransferMatrix = 0;//DXVA2_VideoTransferMatrix_Unknown; ext->VideoLighting = 0;//DXVA2_VideoLighting_Unknown; ext->VideoPrimaries = 0;//DXVA2_VideoPrimaries_Unknown; ext->VideoTransferFunction = 0;//DXVA2_VideoTransFunc_Unknown; /* List all configurations available for the decoder */ UINT cfg_count = 0; DXVA2_ConfigPictureDecode *cfg_list = NULL; if (FAILED(va->vs->GetDecoderConfigurations(va->input, &dsc, NULL, &cfg_count, &cfg_list))) { av_log(NULL, AV_LOG_ERROR, "IDirectXVideoDecoderService_GetDecoderConfigurations failed\n"); return -1; } av_log(NULL, AV_LOG_DEBUG, "we got %d decoder configurations\n", cfg_count); /* Select the best decoder configuration */ int cfg_score = 0; for (unsigned i = 0; i < cfg_count; i++) { const DXVA2_ConfigPictureDecode *cfg = &cfg_list[i]; /* */ av_log(NULL, AV_LOG_DEBUG, "configuration[%d] ConfigBitstreamRaw %d\n", i, cfg->ConfigBitstreamRaw); /* */ int score; if (cfg->ConfigBitstreamRaw == 1) score = 1; else if (codec_id == CODEC_ID_H264 && cfg->ConfigBitstreamRaw == 2) score = 2; else continue; if (IsEqualGUID(cfg->guidConfigBitstreamEncryption, DXVA_NoEncrypt)) score += 16; if (cfg_score < score) { va->cfg = *cfg; cfg_score = score; } } CoTaskMemFree(cfg_list); if (cfg_score <= 0) { av_log(NULL, AV_LOG_ERROR, "Failed to find a supported decoder configuration\n"); return -1; } /* Create the decoder */ IDirectXVideoDecoder *decoder; if (FAILED(va->vs->CreateVideoDecoder(va->input, &dsc, &va->cfg, surface_list, va->surface_count, &decoder))) { av_log(NULL, AV_LOG_ERROR, "IDirectXVideoDecoderService_CreateVideoDecoder failed\n"); return -1; } va->decoder = decoder; av_log(NULL, AV_LOG_DEBUG, "IDirectXVideoDecoderService_CreateVideoDecoder succeed\n"); return 0; } static void DxDestroyVideoDecoder(vlc_va_dxva2_t *va) { if (va->decoder) va->decoder->Release(); va->decoder = NULL; for (unsigned i = 0; i < va->surface_count; i++) va->surface[i].d3d->Release(); va->surface_count = 0; } static int DxResetVideoDecoder(vlc_va_dxva2_t *va) { av_log(NULL, AV_LOG_ERROR, "DxResetVideoDecoder unimplemented\n"); return -1; } static void DxCreateVideoConversion(vlc_va_dxva2_t *va) { switch (va->render) { case MAKEFOURCC('N','V','1','2'): va->output = (D3DFORMAT)MAKEFOURCC('Y','V','1','2'); break; default: va->output = va->render; break; } // CopyInitCache(&va->surface_cache, va->surface_width); } static void DxDestroyVideoConversion(vlc_va_dxva2_t *va) { // CopyCleanCache(&va->surface_cache); } static int Setup(vlc_va_dxva2_t *external, void **hw, PixelFormat *chroma, int width, int height) { vlc_va_dxva2_t *va = external; if (va->width == width && va->height == height && va->decoder) goto ok; /* */ DxDestroyVideoConversion(va); DxDestroyVideoDecoder(va); *chroma = PIX_FMT_NONE; if (width <= 0 || height <= 0) return -1; /* FIXME transmit a video_format_t by VaSetup directly */ video_format_t fmt; memset(&fmt, 0, sizeof(fmt)); fmt.i_width = width; fmt.i_height = height; if (DxCreateVideoDecoder(va, va->codec_id, &fmt)) return -1; /* */ va->hw.decoder = va->decoder; va->hw.cfg = &va->cfg; va->hw.surface_count = va->surface_count; va->hw.surface = va->hw_surface; for (unsigned i = 0; i < va->surface_count; i++) va->hw.surface[i] = va->surface[i].d3d; /* */ DxCreateVideoConversion(va); /* */ ok: *hw = &va->hw; const d3d_format_t *output = D3dFindFormat(va->output); *chroma = output->codec; return 0; } static int Get(vlc_va_dxva2_t *external, AVFrame *ff) { vlc_va_dxva2_t *va = external; /* Check the device */ HRESULT hr = va->devmng->TestDevice(va->device); if (hr == DXVA2_E_NEW_VIDEO_DEVICE) { if (DxResetVideoDecoder(va)) return -1; } else if (FAILED(hr)) { av_log(NULL, AV_LOG_ERROR, "IDirect3DDeviceManager9_TestDevice %u", (unsigned)hr); return -1; } /* Grab an unused surface, in case none are, try the oldest * XXX using the oldest is a workaround in case a problem happens with ffmpeg */ unsigned i, old; for (i = 0, old = 0; i < va->surface_count; i++) { vlc_va_surface_t *surface = &va->surface[i]; if (!surface->refcount) break; if (

        • GPU usage is  0% during hardware decoding of h264 video.
          wl2776

          I've searched through MSDN, Google, other forums... It seems, all I do, is correct. The video card is rather old, it was produced in 2007. Could it be the case, that I can't get hardware decoding with it, and everything is software emulated?

          • GPU usage is  0% during hardware decoding of h264 video.
            Tekamd

            I have the latest Cats 11.10 and 11.11 preview 3 fix and I just loaded a MOEG4 in Media Player Clasic 64-bit (MPC) with 64-bit K-Lite Codec Pack and Cats show me a sustained 10% GPU utilization, 640x480 video and Audio AAC LC 59.6Kbps 44.1Khz, 13% GPU use when I move the player around while playing the movie, is a FLV tho, maybe you should try with a .flv too, get 'atube' to download a video from YouTube.

            Good Luck.

            • [solved] GPU usage is  0% during hardware decoding of h264 video.
              wl2776

              It works. GPU usage became >0% when I run several copies of the application in parallel. Also, it appears that my video card (ATI Radeon HD 4550) is ineffective in decoding small video (704x576), as the CPU load is the same as with pure software decoding. However, it greatly reduces CPU load when decoding 1280x720 video.

              • Re: [solved] GPU usage is  0% during hardware decoding of h264 video.
                zadig

                Hello,

                 

                I used your code to implement a DXVA2 H264 decoder. It works very well with a GeForce GTX 560 Ti (on Windows 7 64). It uses 'H.264 variable-length decoder, no film grain technology' to decode to 'NV12'. This is fine. But if I swap with my ATI card (XFX Radeon HD 6850), the decoding will fail on the fifth decode call. It uses the same decoder. When I debug, the call to avcodec_decode_video2(...) always returns 0 (no error), but starting from the 5th frame, prints those 2 error messages :

                [h264 @ 05cdf6e0] Failed to execute

                [h264 @ 05cdf6e0] hardware accelerator failed to decode picture

                The first message is from dxva2.c (in avcodec, function ff_dxva2_common_end_frame(...) ) :

                    if (FAILED(IDirectXVideoDecoder_Execute(ctx->decoder, &exec))) {

                        av_log(avctx, AV_LOG_ERROR, "Failed to execute\n");

                        result = -1;

                    }

                IDirectXVideoDecoder_Execute fails with error (0x8007000E): Out of memory or system resources.

                I noticed that starting from the 5th frame, the call to avcodec_decode_video2() calls the custom function ffmpeg_ReleaseFrameBuf(...). I tried to release the surface myself but I wasn't successful. In the code, all I can see is that you reuse the same surfaces over and over, so I really don't understand the source of failure since nothing is released in DirectX. The error definitely occurs when ffmpeg_ReleaseFrameBuf() starts being called.

                 

                Since it might have been a memory issue, I also tried reducing the size of the frame (from 1024x768 to 256x192) but it makes no difference. I have at least 700mo on that memory card and it is far from being used at full capacity.

                 

                I know my problem is very specific, but if someone has any clue why DirectX fails when LibAV starts releasing the frames after calling avcodec_decode_video2() 5 times, I would be very grateful.

                 

                Thank you very much