Is there anyway to optimize this kernel?  [FIXED ALGORITHM]

Discussion created by riza.guntur on Jul 17, 2009
Latest reply on Jul 20, 2009 by riza.guntur

Okay, now where the real problem begin, here is the code along with .br and 480x16.txt in the next post.

So far, these code is notoriously slow, and I don't know whether finding minimum or maximum would produce correct result if the output reduced stream has been used more than once.

Using CPU Backend, the speed feels the same with CAL Backend on the given problem

#include "brookgenfiles/percobaan_pertama.h" #include "brook/Device.h" #include <iostream> #include <iomanip> #include <fstream> using namespace std; using namespace brook; int main(int argc, char* argv[]) { unsigned int deviceCount = 0; Device* device = getDevices("cal", &deviceCount); unsigned int jumlahData = 480; unsigned int jumlahDataSatuOutput = 80; unsigned int jumlahDiSatuGrup = 5; unsigned int jumlahDimensi = 16; unsigned int jumlahOutput = jumlahData / jumlahDataSatuOutput; unsigned int yA = jumlahData; unsigned int yB = yA/jumlahDiSatuGrup; unsigned int yC = 1;//how many last columns to be ignored in input file unsigned int streamSize[] = {jumlahDimensi,yA}; unsigned int streamSizeReduce[] = {jumlahDimensi,yB}; unsigned int streamSizeReduceRef[] = {jumlahDimensi,jumlahOutput}; unsigned int streamSizeMinOfVecCluster[] = {1,jumlahOutput}; unsigned int streamSizeMaxOfMin[] = {1,1}; float alpha = 0.05f; float elta = 1.1f; unsigned short rank[3] = {0,1,2}; int num_of_epoch = 1000; float4 *arr0 = new float4[jumlahDimensi*yA]; memset(arr0, 0, jumlahDimensi * yA * sizeof(float4)); ifstream inFile; inFile.open("480x16.txt"); if (!inFile) { cout << "Unable to open file"; exit(1); // terminate with error } for(unsigned int i = 0; i < yA; i++)//reading from file { for(unsigned int j = 0; j < jumlahDimensi + yC; j++) { unsigned int index = i * jumlahDimensi + j; float temp; if( (inFile >> temp) && (j < jumlahDimensi)) { arr0[index].x = temp;//read input int tempOutput = yA/jumlahDataSatuOutput + 1;//expected output target, starting from 1 to jumlahOutput arr0[index].w = (float) tempOutput;//placing round expected target } } } inFile.close(); Stream<float4> input(rank[2], streamSize);//stream input training Stream<float4> input_max_min(rank[2], streamSizeReduce);//y for max, z for min Stream<float4> fuzzy_number(rank[2], streamSizeReduce);//x median Stream<float4> vec_ref(rank[2], streamSizeReduceRef);//stream of vector reference cluster Stream<float4> myu(rank[2], streamSizeReduceRef);//myu streams //myu.x for myu value in that position //myu.y for input expected output //myu.z for which cluster the vec_ref is located/it can be said vec_ref expected output Stream<float4> myu_min(rank[2], streamSizeMinOfVecCluster);//streams of smallest myu in calculated vector reference cluster against fuzzy_number Stream<float4> myu_max_of_min(rank[2], streamSizeMaxOfMin);//biggest of smallest myu streamRead(input,arr0);//copying raw data to input stream max_min(input,input_max_min);//mencari max min, sekaligus mengcopy expected output //1. Fuzzify input training. Finding min, max, and median of group of five for each dimension of 16 for 480 input training with 80 samples per target). I've done it with reduction. fuzzify(input_max_min,fuzzy_number);//find median //2. Pick reference vector. create_reference_vector(fuzzy_number,vec_ref);//memasukkan max ke stream referensi //3. My next step here, I want to calculate the miu for one input training which has 3 sufficient condition, if it has the same median, else if the input median bigger than reference vector median, and else if smaller and another condition if not intersect. I do it for each corresponding dimension for each cluster. for(int epoch = 0; epoch < num_of_epoch;epoch++) { for(int i = 0; i < (int) yB; i++) { myufy(i,fuzzy_number,vec_ref,myu); //4. Finding winner cluster. The next step is finding the miu which is the smallest in each cluster. Then find the biggest miu. The cluster which has the biggest of smallest is the winner. minimum_myu_cluster(myu,myu_min); max_of_min_myu(myu_min,myu_max_of_min); //myu_min and myu_max_of_min need to be reinitialized after each epoch so reduction would give correct result //not to be compared to the previous values //but how? //there is no need to reinitialized myu because it would be overwritten //5. Move the reference vector based on similarity. The following condition applies to the myu_max_of_min: it has miu value of 0 else if the winner cluster is the positioned the same as output else if different. In those condition, th cluster median (altogether) would be shifted nearer or farther, and while the min and max will be stretched out. //move_vector_reference (I haven't made it) }//6. Steps 3 to 5 would be applied to all input training fuzzy number (16 fuzzy vector) in parallel using kernel }//7. Steps 3 to 6 will be iterated 1000 times //I can't swizzle so far, there is no chance of swizzling, or there is? //Actually, last elements in myu, myu_min, or myu_max_of_min is empty. delete[] arr0; getchar(); return 0; }