lasagna

Huge performance difference between XP and Vista?

Discussion created by lasagna on Aug 3, 2009
Latest reply on Aug 4, 2009 by riza.guntur

Hello all,

Recently I have moved a project I am working on from WinXP 32 9.2 Drivers to Vista 64. So far my application does some filtering followed by a sorting algorithm for a 4096 by 4096 stream I modified from the AMD samples (bitonic sort). The sorting step in XP is really quite fast, I can call it a few times per second. In Vista however, sorting 1 stream can take up to two minutes and lock up my entire system doing so.

I have tried adding more RAM (XP and Vista are on the same PC) which helped noticably, but still nowhere near XP performance. I tried going 32 bit in Vista but that changed nothing at all. Using the 9.2, 9.6 or 9.7 drivers did not help either.

This is the code I'm using, which is identical in XP and of course very similar to that of AMD's bitonic sort. I'm not sure why I'm suddenly having such a slowdown while it works flawlessly under XP with older drivers. (I had to modify some of it in order for brcc to compile it correcly if the code seems bloated)

 

Kind regards and  many thanks in advance!

 

CODE: Stream<pair> SortStream(brook::Stream<pair> &input) { /* create two buffers to switch around with */ unsigned int dimensions[] = {4096, 4096}; Stream<pair> stmBuffer1(2, dimensions); Stream<pair> stmBuffer2(2, dimensions); int stage; unsigned int flip = 0; float2 maxvalue = float2(4096.0f, 4096.0f); /* read data into sorting buffers */ stmBuffer1.assign(input); stmBuffer2.assign(input); /* begin sorting (stage = 24) */ for(stage = 1; stage <= 24; stage++) { unsigned int step = 0; float segWidth = (float)pow(2.0f, (int)stage); for(step = 1; step <= stage; ++step) { /* calculate offset */ float offset = (float)pow(2.0f, (int)(stage - step)); if(!flip) { BitonicSort(stmBuffer1, stmBuffer2, segWidth, offset, 2 * offset, maxvalue); } else { BitonicSort(stmBuffer2, stmBuffer1, segWidth, offset, 2 * offset, maxvalue); } flip ^= 0x01; } } /* return sorted stream */ if(flip) { return stmBuffer2; } else { return stmBuffer1; } } KERNEL: kernel void BitonicSort(pair input[][], out pair output<>, float stageWidth, float offset, float twoOffset, float2 maxvalue) { float2 idx1 = (float2)instance().xy; float2 idx2; float idx; float sign, dir; float diff1, diff2; pair max, min; idx = idx1.x + maxvalue.x * idx1.y; /* compare to element above or below */ sign = ( fmod(idx, twoOffset) < offset) ? 1.0f : -1.0f; /* arrow direction in the bitonic search algorithm */ dir = ( fmod( floor(idx/stageWidth), 2.0f) == 0.0f) ? 1.0f : -1.0f; /* calculate the index of the second location */ idx2.x = idx1.x + (sign * offset); idx2.y = idx1.y + floor(idx2.x / maxvalue.x); idx2.x = fmod(idx2.x, maxvalue.x); if(idx2.x < 0.0f) { idx2.x += maxvalue.x; } /* difference variables (swizzling compilation fails at 1.3 I havent gotten around to trying it in 1.4 yet) */ diff1 = input[idx1.y][idx1.x].difference; diff2 = input[idx2.y][idx2.x].difference; /* compare differences for max & min values */ if (diff1 > diff2) { max = input[idx1.y][idx1.x]; } else { max = input[idx2.y][idx2.x]; } if (diff1 < diff2) { min = input[idx1.y][idx1.x]; } else { min = input[idx2.y][idx2.x]; } /* output correct value */ if (sign == dir) { output = min; } else { output = max; } }

Outcomes