Question about SKA result interpretation

Discussion created by edwin.cini on Nov 28, 2010
Latest reply on Dec 1, 2010 by himanshu.gautam

I've just started using OpenCL for my university thesis which is an attempt to implement university timetabling on a GPU using a genetic algorithm.

I'm now working on the initial parts of the system and testing exclusively on SKA, just to get a feeling of the syntax and code performance.  However I don't know how to interpret the overall performance results.  For example, is the attached report a good or a bad one?  Can it be improved?

In my code I tried to balance execution paths as much as possible, but this has at times resulted in a drop in the Throughput figure.  I'm attaching my code for any comments/criticism.  It is by far incomplete and only the tip of the iceberg...


//----------------------- SKA results ----------------------- //Name,GPR,Scratch Reg,Min,Max,Avg,ALU,Fetch,Write,Est Cycles,ALU:Fetch,BottleNeck,%s\Clock,Throughput,CF //Radeon HD 5770,17,0,1.00,182070.00,1548.89,43,5,3,1548.89,1.19,Global Write,0.01,9 M Threads\Sec,37 //----------------------- OpenCL code ----------------------- /* data structure for timetable configuration */ typedef struct { int TimeslotCount; int ExtendedSlotCount; } TimetableConfig; /* data structure for problem specific configuration */ typedef struct { int RoomCount; int RoomFeatureCount; int StudentCount; int EventCount; } ProblemConfig; /* data structure for room description */ typedef struct { int Id; // room id [0..(room count - 1)] int Capacity; // seating capacity int AvailableFeatures; // 32-bit vector indicating available facilities long ValidTimeSlots; // 64-bit vector indicating usable timeslots } Room; /* data structure for event description */ typedef struct { int Id; // event id [0..(event count - 1)] int RequiredFeatures; // 32-bit vector indicating required facilities int AttendanceCount; // number of attendees long ValidTimeslots; // 64-bit vector indicating usable timeslots } Event; /* data structure for student description */ typedef struct Student { int Id; // student id [0..(student count - 1)] }; // Create matrix of possible placements and placement count for event kernel void computeEventPlacements( __constant __read_only TimetableConfig* timetableConfig, // specifies number of timeslots __constant __read_only ProblemConfig* problemConfig, // specifies problem instance __global __read_only Event* eventList, // list of event structures __global __read_only Room* roomList, // list of room structures __global __write_only int* eventPlaceMatrix, __global __write_only int2* eventPlaceCount ) { int id = get_global_id(0); __global Event* pe = &eventList[id]; __global Room* pr; // get local copy of config for faster access __private int roomCount = problemConfig->RoomCount; __private int timeslotCount = timetableConfig->TimeslotCount; __private int extendedSlotCount = timetableConfig->ExtendedSlotCount; __private ulong timeslotMask = 0x01; // 64-bit mask used for timeslot testing eventPlaceCount[id].x = id; eventPlaceCount[id].y = 0; // scan timeslots for (int t = 0; t < timeslotCount; t++) { // scan rooms for (int r = 0; r < roomCount; r++) { pr = &roomList[r]; // check if room is suitable bool roomSuitable = ((pe->ValidTimeslots & timeslotMask != 0) && (pr->Capacity >= pe->AttendanceCount) && (pr->AvailableFeatures & pe->RequiredFeatures == pe->RequiredFeatures)); uint i = (roomSuitable) ? 1 : 0; eventPlaceCount[id].y += i; eventPlaceMatrix[t + (r * extendedSlotCount)] = i; } timeslotMask <<= 1; } // at this stage we have: // * a matrix showing which places we can use // * a vector showing the number of places we can use (sum of matrix elements) }