cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Tomy
Journeyman III

about scatter/gather.

I'm programming kernel which moves like gatherOP/scatterOP.

Hi, All.

I am programming kernel which moves like gatherOP/scatterOP.

but, scatter kernel doesn't move well.

 

example program:

kernel void sub(float a<>, out float b<>{

           b = a - 100;

}

kernel void gather(float test_index<>, float input1D[], out float gather_data<>{

          gather_data = input1D[test_index];

}

kernel void scatter(float test_index<>, float gather_data<>, out float4 outpu1D[]){

          output1D[test_index] = gather_data;

}

......

          for(i=0;i<4;i++){

                     int index = i * Width;

                     test_index = index;

          }

......

          float stream_index<4>;

          float gather<4>;

          float temp<4>;

          float input1D<Length>;

         

          streamRead(input1D, input_data);

          streamRead(stream_index, test_index);

          gather(stream_index, input1D, gather);

          sub(gather, temp);

          scatter(stream_index,temp,input1D);

          streamWrite(input1D, input_data

The result which executed this program is the next.

./test -x 4 -y 4

input1D:

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00

test_index:

0 4 8 12

gather:

0.00 4.00 8.00 12.00

temp:

-100.00 -96.00 -92.00 -88.00

input_data:

-100.00 -100.00 -100.00 -100.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00

 

Movement of scatter kernel is strange.

I expect the following result.

input_data:

-100.00 1.00 2.00 3.00 -96.00 5.00 6.00 7.00 -92.00 9.00 10.00 11.00 -88.00 13.00 14.00 15.00

But I don't knowthe cause of this problem.

Do you know what this problem is?

sorry, me poor english.

thank you.

0 Likes
4 Replies
Ceq
Journeyman III

Hi Tomy, the results you get aren't wrong.

You have defined input1D as a float stream (float1), however you're using it in a kernel like a float4 stream.
As a float4 stream input1D becomes { ( 0 1 2 3), ( 4 5 6 7), ( 8 9 10 11), (12 13 14 15) }
Now the scatter kernel performs:
input1D[0] = temp[0] // Asigns -100 to the first 4 floats
input1D[4] // There isn't such element, I'm not sure what happens internally with this assignation, looks like is omited.
input1D[8] = ... // same case

Scatter only works on 128 bit data types, so you can't just change the kernel to float1.

You could convert your input streams to float4, a kernel like this should work (don't forget to change allocation size):

kernel void f1tof4(float a<>, out float4 b<> ) { b = a; }

You also can use masked writes:

kernel void scatter(float i<>, float a<>, out float4 b[]) {
float m = i % 4.0f;
float d = i / 4.0f;
if(m == 0) b.x = a; else
if(m == 1) b.y = a; else
if(m == 2) b.z = a; else
if(m == 3) b.w = a;
}

-------------------------------------------------------------

EDIT: To AMD, switch statement isn't supported, if this is normal don't forget documenting it.


0 Likes
Tomy
Journeyman III

Hi Ceq,  I'm sorry that a res is late.

 

Thank you very much for your advice!

it moved program by your advice.

but, when the value of index is less than 4, and

when the number of index is less than 4, that didn't move well.

so, I settled this problem by the next program.

 

kernel void scatter(float index<>, float input<>, out float4 out[]){

                   float m = index%4.0f;

                   float d = index/4.0f;

                  if(m==0 || d==0)           out.x=input;

                  else if(m==1 || d==0.25)  out.y=input;

                  else if(m==2 || d==0.5)    out.z=input;

                  else if(m==3 || d==0.75)   out.w=input;

}

but, when the value of index is 0, that didn't move well.

example:

./test -x 8 -y 8

input1D:

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00

test_index:

0.00 1.00 2.00 3.00 4.00

gather_data:

0.00 1.00 .2.00 3.00 4.00 -2.00

temp:

-100.0 -101.0 -102.0 -103.0 -104.0 -2.00

input_data:

-2.00 -101.0 -102.0 -103.0 -104.0 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00

 

this problems cant't be settled.

Do you know what this problem is?

 Thank you.

0 Likes
Ceq
Journeyman III

Well, I'm not sure but I think you should avoid several processors writing fields of the same float4
location, depending on the implementation it could lead to a race condition and undefined behaviour.
It would be better to change your streams and kernels so that only one thread writes each location.

On the other hand looks like the number of threads isn't determined by the index
nor the data stream, but the output (both input streams are resized). You can find
more about this topic here:

http://forums.amd.com/forum/me...id=328&threadid=98317
0 Likes

Tomy,
The problem here is that scatter/gather are 128 bit operations, i.e. every thread writes out 4 32-bit values with each index. So as ceq mentioned, you can possible have multiple threads writing to the same location. There is a way around this, but it requires write masking, which is show in the samples/language/IL/[gather|scatter]_IL samples. This however requires CAL instead of brook+.

0 Likes