How to split job to more than one gpu and then combine the result?

Discussion created by riza.guntur on Jul 31, 2009
Latest reply on Aug 4, 2009 by hagen

I have the attached kernel, but since the datasets is rather small after processed about one fifth of the input I have plan to increase the size as well as splitting the job to more than one gpu

My question is:

1. How do I combine the result data in different gpu? Should I copied it first using assign() operation to the gpu that latter would process the result data and then using a kernel to concate the data into larger stream?

2. Is there any other way to do similar thing faster?

Thank you

kernel void max_min_mean(float2 input[][], out float4 output<>) { int2 index = instance().xy; int i0 = 5*index.y; int i1 = ++i0; int i2 = ++i1; int i3 = ++i2; int i4 = ++i3; float mean; float temp0 = input[i0][index.x].x; float temp1 = input[i1][index.x].x; float temp2 = input[i2][index.x].x; float temp3 = input[i3][index.x].x; float temp4 = input[i4][index.x].x; float temp_max = temp0; float temp_min = temp0; temp_max = (temp_max>temp1)?temp_max:temp1; temp_max = (temp_max>temp2)?temp_max:temp2; temp_max = (temp_max>temp3)?temp_max:temp3; temp_max = (temp_max>temp4)?temp_max:temp4; temp_min = (temp_min<temp1)?temp_min:temp1; temp_min = (temp_min<temp2)?temp_min:temp2; temp_min = (temp_min<temp3)?temp_min:temp3; temp_min = (temp_min<temp4)?temp_min:temp4; mean = 0.2f*(temp0+temp1+temp2+temp3+temp4); output = float4(mean,temp_max,temp_min,input[i0][index.x].y); }