Thanks for trying out HPL with ACML 6!
May I ask if you are running the double precision HPL or single precision HPL (MHPL)? I am asking this because the Kaveri's GPU has a pretty good single precision peak performance at 737 gflops, but its double precision performance is not better than CPU's peak flops. (AnandTech Portal | Floating point peak performance of Kaveri and other recent AMD and Intel chips)
I have run both single and double precision HPL on KV and was able to get bigger numbers. One thing I did was making sure "NB" is really big (such as 1024) and "NBMINs" is big (such as 64). This is because the GPU computation works better when the matrix is not too thin or tall. Actually, in the lua scripts (/Spectre/gemm.lua) you can see the logic that if one of m, n, k(which is NB) is smaller than 64, the computation will be offload to CPU at all time. Can you share your choices of N and NB?
A smarter memory management is definitely beneficial for HPL benchmark. Currently by default ACML will copy the memory in between CPU and GPU. Actually in this beta release there is a way to enable "USE_HOST_PTR" by assigning "2" to "memalloc_choice" inside of /Spectre/gemm.lua. Note that this "hack" is under-tested but I think it will allow you to allocate more memory at the host (bigger N).
Looking at the log file you might find the lda, ldb and ldc are quite big while m, n, k are much smaller. OpenCL actually has a API (clEnqueueReadBufferRect) that only copies the useful part of the memory (memalloc_choice = 3 in the lua file), which enables N to be much bigger. However there is a run-time bug related to this API running on Kaveri in the current driver. I have filed an internal bug ticket and it is fixed in the internal drivers. I believe it will be fixed in the public driver soon.
Thanks for the detailed reply. I think the problem may be becuase I am using double-precision. But can you tell where to find that mhpl you mentioned? I just went and downloaded hpl 2.1
Lets first try the single precision first, then if I still have problems we can look at the other points you mentioned.