A few questions and notes I have that I would like to confirm with everyone/AMD:
1. Is anyone seeing a difference in a Multi-GPU system with CFX on or off?
2. Is it beneficial to split a kernel up into multiple smaller kernels if the ALU:Fetch of the smaller kernels is 1.00 and the larger kernel ALU:Fetch is lower, say ~.83? For me, my results were worse with the multiple smaller kernels than the larger one, even though the ALU:Fetch was "better" for the smaller ones. Why is that?
3. How expensive is branching? It seems to me that it's pretty expensive, but the KSA (while it does give you the CF instructions) doesn't really go into that.
4. It's possible to write in Brook+ and then hand-tweak your IL kernels right? This makes sense to me I just haven't tried it yet. I'm asking because it seems that the certain optimizations do no good in the KSA for the ALU:Fetch ratio but they DO reduce the number of IL instructions.
This is all for now, I eagerly await responses.