cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lasagna
Journeyman III

Huge performance difference between XP and Vista?

Hello all,

Recently I have moved a project I am working on from WinXP 32 9.2 Drivers to Vista 64. So far my application does some filtering followed by a sorting algorithm for a 4096 by 4096 stream I modified from the AMD samples (bitonic sort). The sorting step in XP is really quite fast, I can call it a few times per second. In Vista however, sorting 1 stream can take up to two minutes and lock up my entire system doing so.

I have tried adding more RAM (XP and Vista are on the same PC) which helped noticably, but still nowhere near XP performance. I tried going 32 bit in Vista but that changed nothing at all. Using the 9.2, 9.6 or 9.7 drivers did not help either.

This is the code I'm using, which is identical in XP and of course very similar to that of AMD's bitonic sort. I'm not sure why I'm suddenly having such a slowdown while it works flawlessly under XP with older drivers. (I had to modify some of it in order for brcc to compile it correcly if the code seems bloated)

 

Kind regards and  many thanks in advance!

 

CODE: Stream<pair> SortStream(brook::Stream<pair> &input) { /* create two buffers to switch around with */ unsigned int dimensions[] = {4096, 4096}; Stream<pair> stmBuffer1(2, dimensions); Stream<pair> stmBuffer2(2, dimensions); int stage; unsigned int flip = 0; float2 maxvalue = float2(4096.0f, 4096.0f); /* read data into sorting buffers */ stmBuffer1.assign(input); stmBuffer2.assign(input); /* begin sorting (stage = 24) */ for(stage = 1; stage <= 24; stage++) { unsigned int step = 0; float segWidth = (float)pow(2.0f, (int)stage); for(step = 1; step <= stage; ++step) { /* calculate offset */ float offset = (float)pow(2.0f, (int)(stage - step)); if(!flip) { BitonicSort(stmBuffer1, stmBuffer2, segWidth, offset, 2 * offset, maxvalue); } else { BitonicSort(stmBuffer2, stmBuffer1, segWidth, offset, 2 * offset, maxvalue); } flip ^= 0x01; } } /* return sorted stream */ if(flip) { return stmBuffer2; } else { return stmBuffer1; } } KERNEL: kernel void BitonicSort(pair input[][], out pair output<>, float stageWidth, float offset, float twoOffset, float2 maxvalue) { float2 idx1 = (float2)instance().xy; float2 idx2; float idx; float sign, dir; float diff1, diff2; pair max, min; idx = idx1.x + maxvalue.x * idx1.y; /* compare to element above or below */ sign = ( fmod(idx, twoOffset) < offset) ? 1.0f : -1.0f; /* arrow direction in the bitonic search algorithm */ dir = ( fmod( floor(idx/stageWidth), 2.0f) == 0.0f) ? 1.0f : -1.0f; /* calculate the index of the second location */ idx2.x = idx1.x + (sign * offset); idx2.y = idx1.y + floor(idx2.x / maxvalue.x); idx2.x = fmod(idx2.x, maxvalue.x); if(idx2.x < 0.0f) { idx2.x += maxvalue.x; } /* difference variables (swizzling compilation fails at 1.3 I havent gotten around to trying it in 1.4 yet) */ diff1 = input[idx1.y][idx1.x].difference; diff2 = input[idx2.y][idx2.x].difference; /* compare differences for max & min values */ if (diff1 > diff2) { max = input[idx1.y][idx1.x]; } else { max = input[idx2.y][idx2.x]; } if (diff1 < diff2) { min = input[idx1.y][idx1.x]; } else { min = input[idx2.y][idx2.x]; } /* output correct value */ if (sign == dir) { output = min; } else { output = max; } }

0 Likes
5 Replies
riza_guntur
Journeyman III

No t only you, every Brook+ programs I tried in the examples run worse in Vista. Dunno why. I think it is because I'm using 64-bit of Vista so the pointer grows significantly bigger and eats a lot of bandwidth OR it is because of powerplay.

Anybody knows better reason?

0 Likes

64 bit version shouldn't be slower, I've tested WinXP 32 vs WinXP 64 and GPU kernels are about the same speed, maybe a bit slower, but in my case using x64 my CPU code runs faster so I still get improvement.

I didn't try Vista however, because I got sick of security questions when building programs with gcc and uninstalled it.

0 Likes

Originally posted by: Ceq 64 bit version shouldn't be slower, I've tested WinXP 32 vs WinXP 64 and GPU kernels are about the same speed, maybe a bit slower, but in my case using x64 my CPU code runs faster so I still get improvement.

I didn't try Vista however, because I got sick of security questions when building programs with gcc and uninstalled it.

As for vista security question, you could turn off UAC. It will rejoice your feeling against vista.

0 Likes

Originally posted by: riza.guntur No t only you, every Brook+ programs I tried in the examples run worse in Vista. Dunno why. I think it is because I'm using 64-bit of Vista so the pointer grows significantly bigger and eats a lot of bandwidth OR it is because of powerplay.

 

Anybody knows better reason?

 

 

I have just finished installing Windows Vista Business 32 bit and it's the same performance. It's a shame that I'll be stuck with XP, I may have to try CUDA sometime because performance gaps like these are just not acceptable.

0 Likes

It's not a shame residing with XP.

But it's the constraint that I don't have XP 64 at the moment.

From flops calculated, Vista is about 1/3 of XP performance with ATI Stream, from optimized Matrix Multiplication 173GFlops peak vs 450GFlops peak, the funny thing is in XP the performance gets to the peak in such small dataset.

0 Likes