Optimum performance

Discussion created by jaidotsh on Jul 18, 2011
Latest reply on Jul 19, 2011 by jaidotsh

I wanted to know whether my code( CSR Matrix multiplication ) will give optimum performance if I use all of the below optimizations together??

1. float to float4 (current implementation)

2. Blocking (Yet to add. i.e., grouping into warp sized blocks)

 Does the 2nd optimization matter much in terms of performance?.