12 Replies Latest reply on Mar 10, 2015 11:36 PM by shifan3

    SimpleEncode sample runs very slow on Radeon R9 200 Series

    shifan3

      Hello,

      Recently I downloaded this Media SDK 1.1 Beta and compiled SimpleEncode sample on one of my computer with AMD Radeon R9 200 Series (Driver 14.501.1003.0) and AMD A10-6700 APU with Radeon HD Graphics

      I let it convert a yuv file to h264, with -c exampleConfigLowLatency.cfg, and it was slow like hell.

      I traced its progress, it was about 1~2 frames/second, which was about 50 times slower than normal (another sample on intel/Nvidia machine using their sdk with same input file)

       

      the CPU was about 30% busy, and GPU was very little used.

       

      Is the SDK not support this series of video card yet, or I have to change some value in config file?

      Any information will be helpful.

       

      Thanks

        • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
          gioz68

          Hello

           

          same issue here with Radeon E8860. I reach 3.2 fps to convert an HD yuv file to h264. Each frame is taking 310 to 330 msec to encode.

           

          I have installed the Clock Tool. The results are as follows

           

          Engine Clock: 299 MHz, rise immediately to 625 when running the encoding sample

          Memory: 150 MHz, rise to 1125 when running the encoding sample

          UVD: 100 MHz with UVD Status IDLE

          VCE: 0 MHz with status N/A

           

          GPU activity level is rising to 37% approx. during execution from 0.0%

           

          Any hints will be helpful

           

          Thank you

            • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
              amit.agarwal

              Hi,

               

               

              Could you run the Capability Manager and share the logs with us?

               

               

              Simple Encoder sample supports encoding of YUV frames to H.264 on all SI and above platforms.  And your platform is a supported platform.  Hence it should have worked.

               

               

              Looks like the encoding is not scheduled on the VCE.

               

               

              In addition to the Capability Manager logs could you also share the logs that show up on the console when the Simple Encoder is executed.

               

               

              Please share the logs and we will revert with the solution.

               

               

              Thanks

                • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                  shifan3

                  We noticed that there's a 1.1 release recently so we have some tries on latest version.

                  The sample changes a lot but it seems that the piplelineEncoder one is the most close one which accept yuv source and output h264 file.

                  The encoding fps increased to 20fps (which is 4fps on 1.1 beta) but still not acceptable.  We did some testing on MFT trancoding example provided in 1.0 release  and get 45fps trancoding speed on same machine, so, my point is, there must be something wrong with the amf version.

                  Here's our output for capabilityManager.exe and running screenshot for pipelineEncoder.

                   

                   

                  DX9: List of adapters:

                            0: Device ID: 6811 [AMD Radeon R9 200 Series]

                  DX9 : Choosen Device 0: Device ID: 6811 [AMD Radeon R9 200 Series]

                  Querying video decoder capabilities...

                      Codec AMFVideoDecoderUVD_MJPEG is Not supported

                      Codec AMFVideoDecoderUVD_MPEG4 is Not supported

                      Codec AMFVideoDecoderUVD_H264_AVC is Not supported

                      Codec AMFVideoDecoderUVD_MPEG2 is Not supported

                  Querying video encoder capabilities...

                      Codec AMFVideoEncoderVCE_AVC

                          Acceleration Type:Hardware-accelerated

                          number of supported profiles:3

                              66

                              77

                              100

                          number of supported levels:15

                              10

                              11

                              12

                              13

                              20

                              21

                              22

                              30

                              31

                              32

                              40

                              41

                              42

                              50

                              51

                          number of supported Rate Control Metheds:4

                              0

                              1

                              2

                              3

                          Number of temporal Layers:1

                          Max Supported Job Priority:2

                          IsBPictureSupported:0

                   

                          Max Number of streams supported:16

                          Encoder input:

                              Width: [64-1920]

                              Height: [64-1920]

                              Vertical alignment: 32 lines.

                              Interlaced support: NO

                              Total of 6 pixel format(s) supported:

                                  0: NV12 (native)

                                  1: YUV420P

                                  2: YV12

                                  3: BGRA

                                  4: RGBA

                                  5: ARGB

                              Total of 4 memory type(s) supported:

                                  0: DX9 (native)

                                  1: OPENCL

                                  2: OPENGL

                                  3: HOST

                          Encoder output:

                              Width: [64-1920]

                              Height: [64-1920]

                              Vertical alignment: 32 lines.

                              Interlaced support: NO

                              Total of 1 pixel format(s) supported:

                                  0: NV12 (native)

                              Total of 4 memory type(s) supported:

                                  0: DX9 (native)

                                  1: OPENCL

                                  2: OPENGL

                                  3: HOST

                      Codec AMFVideoEncoderVCE_SVC

                          Acceleration Type:Hardware-accelerated

                          number of supported profiles:3

                              66

                              77

                              100

                          number of supported levels:15

                              10

                              11

                              12

                              13

                              20

                              21

                              22

                              30

                              31

                              32

                              40

                              41

                              42

                              50

                              51

                          number of supported Rate Control Metheds:4

                              0

                              1

                              2

                              3

                          Number of temporal Layers:3

                          Max Supported Job Priority:2

                          IsBPictureSupported:0

                   

                          Max Number of streams supported:16

                          Encoder input:

                              Width: [64-1920]

                              Height: [64-1920]

                              Vertical alignment: 32 lines.

                              Interlaced support: NO

                              Total of 6 pixel format(s) supported:

                                  0: NV12 (native)

                                  1: YUV420P

                                  2: YV12

                                  3: BGRA

                                  4: RGBA

                                  5: ARGB

                              Total of 4 memory type(s) supported:

                                  0: DX9 (native)

                                  1: OPENCL

                                  2: OPENGL

                                  3: HOST

                          Encoder output:

                              Width: [64-1920]

                              Height: [64-1920]

                              Vertical alignment: 32 lines.

                              Interlaced support: NO

                              Total of 1 pixel format(s) supported:

                                  0: NV12 (native)

                              Total of 4 memory type(s) supported:

                                  0: DX9 (native)

                                  1: OPENCL

                                  2: OPENGL

                                  3: HOST

                  Querying video converter capabilities...

                          Converter input:

                              Width: [32-4096]

                              Height: [32-4096]

                              Vertical alignment: 2 lines.

                              Interlaced support: NO

                              Total of 6 pixel format(s) supported:

                                  0: NV12 (native)

                                  1: YV12 (native)

                                  2: BGRA (native)

                                  3: ARGB (native)

                                  4: RGBA (native)

                                  5: YUV420P (native)

                              Total of 4 memory type(s) supported:

                                  0: DX9 (native)

                                  1: OPENCL (native)

                                  2: OPENGL (native)

                                  3: HOST

                          Converter output:

                              Width: [32-4096]

                              Height: [32-4096]

                              Vertical alignment: 2 lines.

                              Interlaced support: NO

                              Total of 6 pixel format(s) supported:

                                  0: NV12 (native)

                                  1: YV12 (native)

                                  2: BGRA (native)

                                  3: ARGB (native)

                                  4: RGBA (native)

                                  5: YUV420P (native)

                              Total of 4 memory type(s) supported:

                                  0: DX9 (native)

                                  1: OPENCL (native)

                                  2: OPENGL (native)

                                  3: HOST

                  PASS

                    • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                      amit.agarwal

                      Hi,

                       

                       

                      Yes, we announced availability of Media SDK v1.1 GA version. 

                      http://developer.amd.com/tools-and-sdks/media-sdk/

                      http://developer.amd.com/community/blog/2015/01/29/media-sdk-v1-1-now-available/

                       

                      We are glad you tried samples from the latest release.

                       

                       

                      Coming to the capability manager log, it looks like H264 decoding is not supported.  Hence you will not be able to run transcoding sample here.

                       

                       

                      Secondly, regarding FPS of encoder,

                      - Could you run the sample 'SimpleEncoder'.

                       

                       

                      This sample will measure the pure encoding performance and print the sample.

                       

                       

                      Kindly share the generated output.

                       

                       

                      Also, kindly share the encoder configuration file you used while executing pipelineEncoder sample?

                       

                       

                      Thanks

                        • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                          geniusyou

                          I have the same problem.

                          DX11: List of adapters:

                                    0: Device ID: 6810 [AMD Radeon R9 200 Series]

                          DX11 : Choosen Device 0: Device ID: 6810 [AMD Radeon R9 200 Serie

                          InitDX11() created HW DX11.1 device

                          InitDX11() created HW DX11 device

                          Querying video decoder capabilities...

                                  Codec AMFVideoDecoderUVD_MJPEG is Not supported

                                  Codec AMFVideoDecoderUVD_MPEG4 is Hardware-accelerated

                                          Decoder input:

                                                  Width: [32-2048]

                                                  Height: [32-2048]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: YES

                                                  Total of 0 pixel format(s) supported:

                                                  Total of 1 memory type(s) supported:

                                                          0: HOST (native)

                                          Decoder output:

                                                  Width: [32-2048]

                                                  Height: [32-2048]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: YES

                                                  Total of 3 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: BGRA

                                                          2: RGBA

                                                  Total of 1 memory type(s) supported:

                                                          0: DX11 (native)

                                  Codec AMFVideoDecoderUVD_H264_AVC is Hardware-accelerated

                                          Decoder input:

                                                  Width: [32-2048]

                                                  Height: [32-2048]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: YES

                                                  Total of 0 pixel format(s) supported:

                                                  Total of 1 memory type(s) supported:

                                                          0: HOST (native)

                                          Decoder output:

                                                  Width: [32-2048]

                                                  Height: [32-2048]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: YES

                                                  Total of 3 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: BGRA

                                                          2: RGBA

                                                  Total of 1 memory type(s) supported:

                                                          0: DX11 (native)

                                  Codec AMFVideoDecoderUVD_MPEG2 is Hardware-accelerated

                                          Decoder input:

                                                  Width: [32-2048]

                                                  Height: [32-2048]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: YES

                                                  Total of 0 pixel format(s) supported:

                                                  Total of 1 memory type(s) supported:

                                                          0: HOST (native)

                                          Decoder output:

                                                  Width: [32-2048]

                                                  Height: [32-2048]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: YES

                                                  Total of 3 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: BGRA

                                                          2: RGBA

                                                  Total of 1 memory type(s) supported:

                                                          0: DX11 (native)

                          Querying video encoder capabilities...

                                  Codec AMFVideoEncoderVCE_AVC

                                          Acceleration Type:Hardware-accelerated

                                          number of supported profiles:3

                                                  66

                                                  77

                                                  100

                                          number of supported levels:15

                                                  10

                                                  11

                                                  12

                                                  13

                                                  20

                                                  21

                                                  22

                                                  30

                                                  31

                                                  32

                                                  40

                                                  41

                                                  42

                                                  50

                                                  51

                                          number of supported Rate Control Metheds:4

                                                  0

                                                  1

                                                  2

                                                  3

                                          Number of temporal Layers:1

                                          Max Supported Job Priority:2

                                          IsBPictureSupported:0

                           

                           

                                          Max Number of streams supported:16

                                          Encoder input:

                                                  Width: [64-1920]

                                                  Height: [64-1920]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: NO

                                                  Total of 6 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: YUV420P

                                                          2: YV12

                                                          3: BGRA

                                                          4: RGBA

                                                          5: ARGB

                                                  Total of 4 memory type(s) supported:

                                                          0: DX11 (native)

                                                          1: OPENCL

                                                          2: OPENGL

                                                          3: HOST

                                          Encoder output:

                                                  Width: [64-1920]

                                                  Height: [64-1920]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: NO

                                                  Total of 1 pixel format(s) supported:

                                                          0: NV12 (native)

                                                  Total of 4 memory type(s) supported:

                                                          0: DX11 (native)

                                                          1: OPENCL

                                                          2: OPENGL

                                                          3: HOST

                                  Codec AMFVideoEncoderVCE_SVC

                                          Acceleration Type:Hardware-accelerated

                                          number of supported profiles:3

                                                  66

                                                  77

                                                  100

                                          number of supported levels:15

                                                  10

                                                  11

                                                  12

                                                  13

                                                  20

                                                  21

                                                  22

                                                  30

                                                  31

                                                  32

                                                  40

                                                  41

                                                  42

                                                  50

                                                  51

                                          number of supported Rate Control Metheds:4

                                                  0

                                                  1

                                                  2

                                                  3

                                          Number of temporal Layers:3

                                          Max Supported Job Priority:2

                                          IsBPictureSupported:0

                           

                           

                                          Max Number of streams supported:16

                                          Encoder input:

                                                  Width: [64-1920]

                                                  Height: [64-1920]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: NO

                                                  Total of 6 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: YUV420P

                                                          2: YV12

                                                          3: BGRA

                                                          4: RGBA

                                                          5: ARGB

                                                  Total of 4 memory type(s) supported:

                                                          0: DX11 (native)

                                                          1: OPENCL

                                                          2: OPENGL

                                                          3: HOST

                                          Encoder output:

                                                  Width: [64-1920]

                                                  Height: [64-1920]

                                                  Vertical alignment: 32 lines.

                                                  Interlaced support: NO

                                                  Total of 1 pixel format(s) supported:

                                                          0: NV12 (native)

                                                  Total of 4 memory type(s) supported:

                                                          0: DX11 (native)

                                                          1: OPENCL

                                                          2: OPENGL

                                                          3: HOST

                          Querying video converter capabilities...

                                          Converter input:

                                                  Width: [32-4096]

                                                  Height: [32-4096]

                                                  Vertical alignment: 2 lines.

                                                  Interlaced support: NO

                                                  Total of 6 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: YV12 (native)

                                                          2: BGRA (native)

                                                          3: ARGB (native)

                                                          4: RGBA (native)

                                                          5: YUV420P (native)

                                                  Total of 4 memory type(s) supported:

                                                          0: DX11 (native)

                                                          1: OPENCL (native)

                                                          2: OPENGL (native)

                                                          3: HOST

                                          Converter output:

                                                  Width: [32-4096]

                                                  Height: [32-4096]

                                                  Vertical alignment: 2 lines.

                                                  Interlaced support: NO

                                                  Total of 6 pixel format(s) supported:

                                                          0: NV12 (native)

                                                          1: YV12 (native)

                                                          2: BGRA (native)

                                                          3: ARGB (native)

                                                          4: RGBA (native)

                                                          5: YUV420P (native)

                                                  Total of 4 memory type(s) supported:

                                                          0: DX11 (native)

                                                          1: OPENCL (native)

                                                          2: OPENGL (native)

                                                          3: HOST

                          PASS

                           

                          Config file is much like the sample config.

                          Also with the latest SDK output files always have ReFrames: 4 frames, but 1.1 beta is ok.

                            • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                              amit.agarwal

                              Hi,

                               

                               

                              What you have shared is the CapabilityManager log.

                               

                               

                              Could you run the  'SimpleEncoder' sample and share the output?  This sample measures the pure encoding performance and prints the same.

                               

                              Thanks

                              • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                                amit.agarwal

                                Hi,


                                Another point to note


                                Difference between Simple Encoder and Pipeline Encoder sample


                                Simple Encoder

                                - Encodes raw video content to generate compressed H.264 Elementary stream

                                - Sample uses native AMF APIs

                                - Input is created within the sample itself depending on the input memory type.  In case of DX9 the input is a square box moving across a background.  In case of DX11 the input is alternating different colored backgrounds

                                - The sample prints Latency, Average encode time and Average time in ms to write one encoded frame to the file

                                - Here the avg. encoding time is the PURE encoding time, means time to encode one frame by VCE

                                 

                                Pipeline Encoder

                                - Encodes user specified raw video to generate compressed H.264 Elementary Stream

                                - Sample uses Pipeline based APIs.  "Pipeline" is an framework using AMF Native APIs within

                                - Natively (without internal conversion), the encoder supports only NV12 as input, but if the user passes any other format BGRA, AGRA, RGBA, YV12, YUV420P, it will be converted by the internal converter before submitting to the encoder block.  Input file formats: NV12, YUV420P, BGRA, ARGB, RGBA, YV12 frames

                                - In case of the pipeline Encoder, because of the nature of the pipeline framework, the performance measure is the sum of “Time to read one frame” + “Color Conversion” + “Encoding Time” + “File Write Time”.  Hence the reason why you are getting low performance numbers on executing Pipeline Encoder sample.

                                 

                                For TRUE encoding performance of the HW on AMF, we would suggest you execute Simple Encoder sample.


                                Thanks

                                  • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                                    geniusyou

                                    Hi

                                     

                                    Thanks for the information about amf only support nv12.

                                     

                                    I did another test with Pipeline Encoder:

                                    I'm using the same config file. same input file with 1920*1080

                                    • 1. With yv12 input:

                                    Average (Max, fr#) Encode Latency: 24.7 ms (50.1 ms frame# 0)

                                    Average (Max) Frame size: 79967 bytes (576728 bytes)

                                    Frames processed: 3600 Frame process time: 24.7ms FPS: 40.4

                                    Average FPS: 40.4 Total Time: 89086.3

                                    • 2. With yv12 input, and convert to nv12 outside the amf:

                                    Average (Max, fr#) Encode Latency: 14.8 ms (45.7 ms frame# 2612)

                                    Average (Max) Frame size: 79967 bytes (576728 bytes)

                                    Frames processed: 3600 Frame process time: 19.9ms FPS: 50.3

                                    Average FPS: 50.3 Total Time: 71682.5

                                     

                                    So there seems some performance problem with color convert inside amf.

                                     

                                    And yes simpleEncoder is much faster with Encode Latency about 10ms.

                                    But I thought it will also cost me some time converting the Host memory to DX memory. so I haven't tried it

                                      • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                                        amit.agarwal

                                        Hi,

                                         

                                         

                                        The drop in performance is due to the following reason.

                                         

                                         

                                        Basically, if OpenCL is NOT initialized, which is the case, Color Converter uses CPU for processing.  If OpenCL is initialized, Color Converter uses GPU for processing.

                                         

                                         

                                        And the drop in performance is due to Color Convertion happening on the CPU.

                                         

                                         

                                        Here are the steps to initialize OpenCL in the Pipeline Encoder use-case, to ensure GPU Color Conversion

                                         

                                         

                                        1. EncodePipeline.cpp --> Function: EncodePipeline::Init

                                         

                                         

                                        Add code to initialize OpenCL (as shown in BOLD)

                                         

                                         

                                            else if (engineStr == L"DX11")

                                            {

                                                engineMemoryType = amf::AMF_MEMORY_DX11;

                                                res = m_deviceDX11.Init(adapterID, false);

                                                if (res != AMF_OK)

                                                {

                                                    LOG(m_pLogFile, "%s %s %d \n ", "m_deviceDX11.Init() failed @",

                                                                    __FILE__, __LINE__);

                                                    CHECK_AMF_ERROR_RETURN(res, L"m_deviceDX11.Init() failed");

                                                }

                                                res = m_pContext->InitDX11(m_deviceDX11.GetDevice());

                                                if (res != AMF_OK)

                                                {

                                                    LOG(m_pLogFile, "%s %s %d \n ", "m_pContext->InitDX11() failed @",

                                                                    __FILE__, __LINE__);

                                                    CHECK_AMF_ERROR_RETURN(res, L"m_pContext->InitDX11() failed");

                                                }

                                            }

                                         

                                         

                                            res = m_deviceOpenCL.Init(m_deviceDX9.GetDevice(), m_deviceDX11.GetDevice(), NULL, NULL);

                                            if (res != AMF_OK)

                                            {

                                                LOG(m_pLogFile, "%s %s %d \n ", "m_deviceOpenCL.Init() failed @",

                                                                __FILE__, __LINE__);

                                                CHECK_AMF_ERROR_RETURN(res, L"m_deviceOpenCL.Init() failed");

                                            }

                                           

                                          res = m_pContext->InitOpenCL(m_deviceOpenCL.GetCommandQueue());

                                            if (res != AMF_OK)

                                            {

                                                LOG(m_pLogFile, "%s %s %d \n ", "m_pContext->InitOpenCL() failed @",

                                                                __FILE__, __LINE__);

                                                CHECK_AMF_ERROR_RETURN(res, L"m_pContext->InitOpenCL() failed");

                                            }

                                         

                                        2. EncodePipeline.h

                                         

                                         

                                        Add the following header file (as shown in BOLD)

                                         

                                         

                                        #include "DeviceDX11.h"

                                        #include "DeviceOpenCL.h"

                                         

                                         

                                        3. In the project file, add DeviceOpenCL.h and DeviceOpenCL.cpp else it will lead to linking errors.

                                         

                                         

                                        Common Files/inc/DeviceOpenCL.h

                                        Common Files/src/DeviceOpenCL.cpp

                                         

                                         

                                        Also, in the Configuration file, update QualityPreset to SPEED.

                                         

                                         

                                        Please update us on the performance you observe with these changes.

                                         

                                         

                                        NOTE: 

                                        The overall performance observed is NOT the pure VCE encoding performance BUT a summation of VCE encoding Performance +

                                        1. reading RAW stream data (even for RAM disk)

                                        2. color space conversion from YUV420 to NV12

                                        3. File write time

                                         

                                         

                                        And feel free to ping us for any clarifications.

                                         

                                         

                                        Thanks

                                        1 of 1 people found this helpful
                                          • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                                            geniusyou

                                            Hi:

                                            The test result is fine with OpenCL initialized.

                                            • 1. With yv12 input with OpenCL initialized:

                                            Average (Max, fr#) Encode Latency: 16.5 ms (30.8 ms frame# 0)

                                            Average (Max) Frame size: 79967 bytes (576728 bytes)

                                            Frames processed: 3600 Frame process time: 19.4ms FPS: 51.6

                                            Average FPS: 51.6 Total Time: 69890.8

                                             

                                            QualityPreset  is Speed already!

                                             

                                            What I mean With yv12 input, and convert to nv12 outside the amf is doing following change:


                                            else if (std_string == L"yv12")

                                            {

                                                   ret = amf::AMF_SURFACE_NV12;

                                            }

                                             

                                            case amf::AMF_SURFACE_NV12:

                                                    YV12ToNV12PicCopy(m_frame.GetData(), m_stride, m_height, pDstBits,

                                                                    dstStride, valignment);

                                                    /*NV12PicCopy(m_frame.GetData(), m_stride, m_height, pDstBits, dstStride,

                                                                    valignment);*/

                                             

                                             

                                            static void YV12ToNV12PicCopy(const amf_uint8 *src, amf_int32 srcStride,

                                                            amf_int32 srcHeight, amf_uint8 *dst, amf_int32 dstStride,

                                                            amf_int32 dstHeight)

                                            {

                                                // Y- plane

                                                PlaneCopy(src, srcStride, srcHeight, dst, dstStride, dstHeight);

                                             

                                             

                                                // U - plane

                                                amf_int32 srcYSize = srcHeight * srcStride;

                                             

                                             

                                              amf_uint8* pSrcUPtr = (amf_uint8*)(src + srcYSize);

                                              amf_uint8* pSrcVPtr = pSrcUPtr + (srcYSize >> 2);

                                              amf_uint8* pDstUPtr = (amf_uint8*)(dst + dstHeight * dstStride);

                                              amf_int32 srcUStride = srcStride>>1;

                                              amf_int32 halfsrcUHeight = srcHeight>>2;

                                              int indexj = 0;

                                              for(int j = 0; j < halfsrcUHeight; j++)

                                              {

                                              amf_uint8* pTempDst = pDstUPtr + indexj * dstStride;

                                              for(int i = 0; i < srcUStride; i++)

                                              {

                                              *pTempDst++ = *pSrcVPtr++;

                                              *pTempDst++ = *pSrcUPtr++;

                                              }

                                              indexj++;

                                              pTempDst = pDstUPtr + indexj * dstStride;

                                              for(int i = 0; i < srcUStride; i++)

                                              {

                                              *pTempDst++ = *pSrcVPtr++;

                                              *pTempDst++ = *pSrcUPtr++;

                                              }

                                              indexj++;

                                              }

                                            }

                                             

                                            This is also cpu convert, but will make FPS from 40 to 50. and this is 5ms per frame!

                                             

                                            We are using AMD Mpeg Filter right now and want to change to AMF, but from the test the performance isn't improved at all, somehow maybe slow down a little.

                                            Also old AMD card can't use AMF at all, so question is why should I use this new thing?

                            • Re: SimpleEncode sample runs very slow on Radeon R9 200 Series
                              shifan3

                              We have done some tests with nv12 input, and get 70+fps encoding speed which is quite fine for us.

                              Thanks for your suggestions