AnsweredAssumed Answered

Wide memory access

Question asked by josopait on Nov 28, 2012
Latest reply on Nov 29, 2012 by settle

Hello,

 

is it possible to have wide memory access without declaring the buffer as float4? Does anybody know?

 

I have a kernel similar to this:

 

__kernel void test(__global float* uav0)

{

  int r = get_global_id(0);

  uav0[r] = 0;

  uav0[r+1] = 0;

  uav0[r+2] = 0;

  uav0[r+3] = 0;

}

 

The assembler code generated from this doesn't look very optimal. For each memory access, it has one tbuffer_store_format_x instruction. I could work around this by declaring the buffer float4, but I don't want to do that. From what I understand, the program would be considerably faster if the memory was written with one single tbuffer_store_format_xyzw instruction, even if the index is not guaranteed to be aligned. Is there a way to enforce this? I am surprised that the compiler is not able to optimize this on its own.

 

Thanks for any help

Ingo Josopait

Outcomes