As I know, there are 32 dword sized banks. Period.
If you read 64 consecutive dwords, then it will take 2 cycles to process. Every bank will work, there will be no conflicts. That's the fastest speed of the LDS.
In the second example you read 64 consecutive float2's. First those are split to dwords and every LDS bank will handle 4x reads. It will take 4 cycles and because of no bank conflicts, all the banks will be busy.
Both examples are using LDS at max utilization. Only the latter has 2x as much data to work with.