gat3way

Multi-GPU broken with SDK 2.5

Discussion created by gat3way on Aug 6, 2011
Latest reply on Nov 16, 2011 by kphillisjr

Previously we had GPU_USE_SYNC_OBJECTS environment variable and it apparently does not work now. We have again those spinlocks in the runtime and the 100% CPU usage problem..performance drops. Thank you, but I am sticking with 2.4 until that's solved.

bitselect() still not mapped to BFI_INT. Why?

The BFE_UINT optimization (which is mentioned in the docs) for some reason is slower when it operates on values from __local memory, for some reason additional MOV instructions are generated and now some of my kernels are slower. Because MOV+BFE is slower than LSHR+AND.

offline compilation now broken too.

I am rather disappointed :(

Outcomes