Update: it seems to be a problem that was introduced in version 4.3.0. The new features section of 4.3.0 mentions: "Level 1 BLAS routines have been tuned for AMD Istanbul processors. Routines affected include xDOT, xCOPY, xAXPY, and xSCAL routines."
When I tried my small test case with dcopy from ACML 4.2.0 it works, with 4.3.0 it fails.
1 of 1 people found this helpful
I think this will be considered a bug. The problem is caused by an intermediate result that multiplies N by the element size. Any time this overflows a 32-bit integer the routine will fail. This problem will happen in any of the Level 1 copy routines, at different values of N depending on the size of an array element.
The test does work if you build and link with the 64-bit integer library, and that may be considered a work around.
For all of the blas routines, at some point arrays are too large to use 32-bit address computation and it is necessary to use the 64-bit integer libraries. We can change the copy routines to delay the size at which this occurs - as you point out it used to work!
This will be resolved in our upcoming 5.3 release.
Since I can't link ScaLAPACK which is an integer*4 build to the ACML integer*8 library version, the work-around isn't really an option for me.
Interestingly, the MKL bug only affects Opteron processors, and it also occurs after some optimization of the ?copy codes in version 10.2. So I wondered, do ACML and MKL share some code?
Anyway, thanks for the info, I'll keep an eye out for 5.3.
I tested with ACML 5.3.0 and everything works now, thank you!