Using ACML-GPU and HD4870x2 to run HPL

Hi, all!

I want using ACML-GPU lib and HD4870x2 graphical card to run high-performance linpack.

when NB<=478, ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N) will be equal to zero, this calculation is passed.

but when N>1000 and NB>478, ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N) will not be equal to zero, the result is wrong, the calculation is failed!

I want to know why NB cant greater than 458, and how can I increase my linpack benchmark score.