AnsweredAssumed Answered

vload4 vs 4 individual memory accesses : bank conflicts

Question asked by boxerab on Aug 6, 2014
Latest reply on Sep 22, 2014 by dipak

What is the advantage of vload4 over 4 single memory accesses?

Suppose I am loading memory from local memory. Below are two kernels. The second kernel should exhibit no bank conflict.

Does the first have bank conflicts? Because, if one vload is executed per clock, then there should be conflicts in a half wave.

void kernel1() {

int start = get_global_id(0)*4;

int4 test = vload4(start,localBuffer);




void kernel2() {

int4 test;

int start = get_global_id(0)*4;

test.x = localBuffer[start];

test.y = localBuffer[start+1];

test.z = localBuffer[start+2];

  test.w = localBuffer[start+3];