I have a kernel that I have written to perform some dynamic programming routine particularly targeting the GCN architecture. Recently, I tried to optimize the kernel by getting rid of If-Else constructs and replacing them with select instead. However, the same kernel works fine for my HD 7970 GPUs and with some improvement in speed but the strange thing is that the same kernel does not work correctly on the HD 7750 GPUs.
By not working I mean - the output of the kernel is a a huge table of values. I verify against a sequential implementation on CPU after a kernel execution and the HD 7970 results are always correct but the results from the HD 7750 are somewhere between 60% to 90% correct. For example, 4,193,984 out of 4,194,304 passes verification.
Again ONLY thing I did was replace if-else with select in the kernel. Could anyone please shed some light on this strange behavior? Many thanks and I can provide kernel codes if necessary. Thanks.