I have a kernel, such that when put in KSA runs faster at ALU:Fetch = .88 than at ALU:Fetch = 1.07, even though according to the KSA's throughput and threads/clock, both are higher with the ALU:Fetch = 1.07.
My program runs faster when the kernel ALU:Fetch = .88 and not 1.07. Also, going by the "closer to 1" theory, it would appear that the 1.07 would run faster, not the .88.
I'm very curious why this could be?
Could it be that the KSA is not accounting for some added instructions?
This doesn't really make sense to me, KSA and the runtime are contradicting each other. This might be easier to see with a runtime profiler.
ALSO, KSA is locking up on me whenever I try to print either the object or the source. I don't have any problems printing in any other application.