As I have seen questions regarding Linpack in the forums before I want to point out that we just released the Linpack code that was run on LOEWE-CSC to put in on #22 in Novermber 2010's Top 500. We also published the DGEMM implementation for Cypress type GPUs that we used along with some documentation. Note that it is written in CAL, not in OpenCL, though. The DGEMM can reach about 623 Gflops on 2 Magny-Cours + 1 AMD 5870.
You can grab everything from http://code.compeng.uni-frankfurt.de.