AnsweredAssumed Answered

Long vector addition is not so accelerated, why?

Question asked by aokomoriuta on Oct 2, 2012
Latest reply on Oct 6, 2012 by settle

Now I'm trying to accelerate vector arithmetic, which has more than 100,000 elements.

Firstly I wrote following .cl code;

//! Add each element
    \param result vector which result is stored to
    \param left added vector
    \param right adding vector
    \param C coefficient for adding vector
__kernel void AddVectorVector(
    __global Real* result,
    const __global const Real* left,
    const __global const Real* right,
    const Real C)
    // get element index
    int i = get_global_id(0);

    // add each element
    result[i] = left[i] + C * right[i];

But this is very slow, slower than single CPU.


How can I optimize this code? Is it limit of GPU?

OS: Windows 7

CPU: Xenon E3-1245

GPU: Radeon HD 6950