cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

freedayman
Adept I

How to deal alpha blending with opencl efficiently

Hi Everyone,

Does any one know how to deal alpha blending with opencl efficiently? I had ported the host code to opencl, but the performance come out too poor. does anybody has any advice?

0 Likes
1 Solution

Hi Jeff Golds,

How to do with the opencl image for the alpha blend? Can you give me some kernel code for suggestion?

do it like following code?

const sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST;

__kernel void imageAlphaBlend_kernel(__write_only image2d_t targetImage, __read_only image2d_t sourceImage1, __read_only image2d_t sourceImage2, const uint width, const uint height)

{

    const uint gx = get_global_id(0);

    const uint gy = get_global_id(1);

    if (gx >= width || gy >= height)

        return;

   

    int2 imageuv = (int2)(gx, gy);

    float4 color1 = read_imagef(sourceImage1, imageSampler, imageuv);

    float4 color2 = read_imagef(sourceImage2, imageSampler, imageuv);

    float4 dColor = mix(color2, color1, color1.w);

    write_imagef(targetImage, imageuv, dColor);

}

View solution in original post

0 Likes
8 Replies
himanshu_gautam
Grandmaster

You should give more information. CPU, GPU, SDK version, Driver Version, OS, 32or64 bit.

Then info about the algorithm (maybe wiki link), and your ported code.

0 Likes

Hi Himanshu,

I am working on a video effect project with opencl gpu accleated. Following is the code of opencl kernel for alpha blending. Can you give me some suggestion or guide me to optimize the opencl kernel.

platform info: CPU: Intel E7300, GPU: AMD Radeon HD 7850, OS: Windows  7 32bit, SDK: AMD APP 2.8

uint applyAlphaToPixel(uint uPixel, uint uAlpha)

{

          uint uColor = uPixel;

          uPixel &= 0x00FF00FF;

          uPixel *= uAlpha;

          uPixel += 0x00800080;

          uPixel &= 0xFF00FF00;

          uPixel >>= 8;

          uColor &= 0xFF00FF00;

          uColor >>= 8;

          uColor *= uAlpha;

          uColor += 0x00800080;

          uColor &= 0xFF00FF00;

          uColor |= uPixel;

          return uColor;

}

__kernel void alphaBlend_kernel(__global uint* pTarget, __global uint* pSource, const uint width, const uint height)

{

          uint gx = get_global_id(0);

          uint gy = get_global_id(1);

          if (gx >= width || gy >= height)

                    return;

          uint id = gy * width + gx;

          uint tc = pTarget[id];

          uint sc = pSource[id];

          uint sa = sc >> 24;

          if (sa == 0x0) {

                    pTarget[id] = tc;

          } else if (sa == 0xFF) {

                    pTarget[id] = sc;

          } else {

                    sc = applyAlphaToPixel(sc, sa);

                    tc = applyAlphaToPixel(tc, (0xFF ^ sa));

                    pTarget[id] = sc + tc;

          }

}

0 Likes

Can you use images instead?  That would definitely simplify the amount of work required to alpha blend.

Edit: Well you wouldn't be able to do it in-place, which could be a problem.

0 Likes

Hi Jeff Golds,

How to do with the opencl image for the alpha blend? Can you give me some kernel code for suggestion?

do it like following code?

const sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST;

__kernel void imageAlphaBlend_kernel(__write_only image2d_t targetImage, __read_only image2d_t sourceImage1, __read_only image2d_t sourceImage2, const uint width, const uint height)

{

    const uint gx = get_global_id(0);

    const uint gy = get_global_id(1);

    if (gx >= width || gy >= height)

        return;

   

    int2 imageuv = (int2)(gx, gy);

    float4 color1 = read_imagef(sourceImage1, imageSampler, imageuv);

    float4 color2 = read_imagef(sourceImage2, imageSampler, imageuv);

    float4 dColor = mix(color2, color1, color1.w);

    write_imagef(targetImage, imageuv, dColor);

}

0 Likes

Yes, that looks like it should do the trick.  Mix() compiles to a few ops, but certainly a
lot fewer than all the integer ops you had previously.  It is annoying that you have to do the blend
out-of-place.

0 Likes

          uint sa = sc >> 24;

          if (sa == 0x0) {

                    pTarget[id] = tc;

          } else if (sa == 0xFF) {

                    pTarget[id] = sc;

          } else {

                    sc = applyAlphaToPixel(sc, sa);

                    tc = applyAlphaToPixel(tc, (0xFF ^ sa));

                    pTarget[id] = sc + tc;

          }

The code above can be branchy depending on the value of "sa". Depending on the branchiness, your performance can vary.

Also, if you are using images, spatial caching will help in performance.

btw.. Have you got any performance improvement with Images?

0 Likes

Hi Himanshu,

Using images will need a third texture and have to copy memory between buffer and image. Can the performance will improved with images?

now, I have used uchar4 instead of uint. Read an uchar4 into a float4 then doing alpha blending in float and storing back to uchar4. the branch is removed, but the performance still can not catch up the intel AVX counterpart with OMP, the new opencl kernel become as following, can you see what improve can be done to further gain the performance?

__kernel void alphaBlend_kernel(__global uchar4* pTarget, __global uchar4* pSource, const uint workSize)

{

  uint id = get_global_id(0);

  if (id >= workSize)

    return;

  uchar4 ut = pTarget[id];

  uchar4 us = pSource[id];

  float4 ft = convert_float4(ut);

  float4 fs = convert_float4(us);

  float fa  = (255.0f - fs.w) / 255.0f;

  pTarget[id] = convert_uchar4(mix(fs, ft, fa));

}


few suggestion:

What is your problem size?? Is it enough to saturate the GPU? (global size for the kernel)

In case you are launching too many threads, you can consider giving more work to your threads. This can save some scheduling, but in most cases will not help much.

your kernel looks very simple, so i do not assume any VGPR bottleneck. But you can check the profiling counter output using CodeXL.

You should also check the optimization flags that can be passed while kernel compilation to clBuildProgram.

0 Likes