8 Replies Latest reply on Jun 7, 2013 4:27 AM by himanshu.gautam

    How to deal alpha blending with opencl efficiently

    freedayman

      Hi Everyone,

      Does any one know how to deal alpha blending with opencl efficiently? I had ported the host code to opencl, but the performance come out too poor. does anybody has any advice?

        • Re: How to deal alpha blending with opencl efficiently
          himanshu.gautam

          You should give more information. CPU, GPU, SDK version, Driver Version, OS, 32or64 bit.

          Then info about the algorithm (maybe wiki link), and your ported code.

            • Re: How to deal alpha blending with opencl efficiently
              freedayman

              Hi Himanshu,

              I am working on a video effect project with opencl gpu accleated. Following is the code of opencl kernel for alpha blending. Can you give me some suggestion or guide me to optimize the opencl kernel.

              platform info: CPU: Intel E7300, GPU: AMD Radeon HD 7850, OS: Windows  7 32bit, SDK: AMD APP 2.8

               

              uint applyAlphaToPixel(uint uPixel, uint uAlpha)

              {

                        uint uColor = uPixel;

                        uPixel &= 0x00FF00FF;

                        uPixel *= uAlpha;

                        uPixel += 0x00800080;

                        uPixel &= 0xFF00FF00;

                        uPixel >>= 8;

                        uColor &= 0xFF00FF00;

                        uColor >>= 8;

                        uColor *= uAlpha;

                        uColor += 0x00800080;

                        uColor &= 0xFF00FF00;

                        uColor |= uPixel;

                        return uColor;

              }

              __kernel void alphaBlend_kernel(__global uint* pTarget, __global uint* pSource, const uint width, const uint height)

              {

                        uint gx = get_global_id(0);

                        uint gy = get_global_id(1);

                        if (gx >= width || gy >= height)

                                  return;

                        uint id = gy * width + gx;

                        uint tc = pTarget[id];

                        uint sc = pSource[id];

                        uint sa = sc >> 24;

                        if (sa == 0x0) {

                                  pTarget[id] = tc;

                        } else if (sa == 0xFF) {

                                  pTarget[id] = sc;

                        } else {

                                  sc = applyAlphaToPixel(sc, sa);

                                  tc = applyAlphaToPixel(tc, (0xFF ^ sa));

                                  pTarget[id] = sc + tc;

                        }

              }

                • Re: How to deal alpha blending with opencl efficiently
                  jeff_golds

                  Can you use images instead?  That would definitely simplify the amount of work required to alpha blend.

                   

                  Edit: Well you wouldn't be able to do it in-place, which could be a problem.

                    • Re: How to deal alpha blending with opencl efficiently
                      freedayman

                      Hi Jeff Golds,

                      How to do with the opencl image for the alpha blend? Can you give me some kernel code for suggestion?

                      do it like following code?

                       

                      const sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST;

                      __kernel void imageAlphaBlend_kernel(__write_only image2d_t targetImage, __read_only image2d_t sourceImage1, __read_only image2d_t sourceImage2, const uint width, const uint height)

                      {

                          const uint gx = get_global_id(0);

                          const uint gy = get_global_id(1);

                          if (gx >= width || gy >= height)

                              return;

                         

                          int2 imageuv = (int2)(gx, gy);

                          float4 color1 = read_imagef(sourceImage1, imageSampler, imageuv);

                          float4 color2 = read_imagef(sourceImage2, imageSampler, imageuv);

                          float4 dColor = mix(color2, color1, color1.w);

                          write_imagef(targetImage, imageuv, dColor);

                      }

                    • Re: How to deal alpha blending with opencl efficiently
                      himanshu.gautam

                                uint sa = sc >> 24;

                                if (sa == 0x0) {

                                          pTarget[id] = tc;

                                } else if (sa == 0xFF) {

                                          pTarget[id] = sc;

                                } else {

                                          sc = applyAlphaToPixel(sc, sa);

                                          tc = applyAlphaToPixel(tc, (0xFF ^ sa));

                                          pTarget[id] = sc + tc;

                                }

                       

                      The code above can be branchy depending on the value of "sa". Depending on the branchiness, your performance can vary.

                      Also, if you are using images, spatial caching will help in performance.

                      btw.. Have you got any performance improvement with Images?

                        • Re: How to deal alpha blending with opencl efficiently
                          freedayman

                          Hi Himanshu,

                          Using images will need a third texture and have to copy memory between buffer and image. Can the performance will improved with images?

                          now, I have used uchar4 instead of uint. Read an uchar4 into a float4 then doing alpha blending in float and storing back to uchar4. the branch is removed, but the performance still can not catch up the intel AVX counterpart with OMP, the new opencl kernel become as following, can you see what improve can be done to further gain the performance?

                          __kernel void alphaBlend_kernel(__global uchar4* pTarget, __global uchar4* pSource, const uint workSize)

                          {

                            uint id = get_global_id(0);

                            if (id >= workSize)

                              return;

                           

                            uchar4 ut = pTarget[id];

                            uchar4 us = pSource[id];

                            float4 ft = convert_float4(ut);

                            float4 fs = convert_float4(us);

                            float fa  = (255.0f - fs.w) / 255.0f;

                            pTarget[id] = convert_uchar4(mix(fs, ft, fa));

                          }


                          1 of 1 people found this helpful
                            • Re: How to deal alpha blending with opencl efficiently
                              himanshu.gautam

                              few suggestion:

                              What is your problem size?? Is it enough to saturate the GPU? (global size for the kernel)

                              In case you are launching too many threads, you can consider giving more work to your threads. This can save some scheduling, but in most cases will not help much.

                              your kernel looks very simple, so i do not assume any VGPR bottleneck. But you can check the profiling counter output using CodeXL.

                              You should also check the optimization flags that can be passed while kernel compilation to clBuildProgram.