riza.guntur

How to split job to more than one gpu and then combine the result?

Discussion created by riza.guntur on Jul 31, 2009
Latest reply on Aug 4, 2009 by hagen

I have the attached kernel, but since the datasets is rather small after processed about one fifth of the input I have plan to increase the size as well as splitting the job to more than one gpu

My question is:

1. How do I combine the result data in different gpu? Should I copied it first using assign() operation to the gpu that latter would process the result data and then using a kernel to concate the data into larger stream?

2. Is there any other way to do similar thing faster?

Thank you

kernel void max_min_mean(float2 input[][], out float4 output<>) { int2 index = instance().xy; int i0 = 5*index.y; int i1 = ++i0; int i2 = ++i1; int i3 = ++i2; int i4 = ++i3; float mean; float temp0 = input[i0][index.x].x; float temp1 = input[i1][index.x].x; float temp2 = input[i2][index.x].x; float temp3 = input[i3][index.x].x; float temp4 = input[i4][index.x].x; float temp_max = temp0; float temp_min = temp0; temp_max = (temp_max>temp1)?temp_max:temp1; temp_max = (temp_max>temp2)?temp_max:temp2; temp_max = (temp_max>temp3)?temp_max:temp3; temp_max = (temp_max>temp4)?temp_max:temp4; temp_min = (temp_min<temp1)?temp_min:temp1; temp_min = (temp_min<temp2)?temp_min:temp2; temp_min = (temp_min<temp3)?temp_min:temp3; temp_min = (temp_min<temp4)?temp_min:temp4; mean = 0.2f*(temp0+temp1+temp2+temp3+temp4); output = float4(mean,temp_max,temp_min,input[i0][index.x].y); }

Outcomes