I want to compare some computation times with Blas (with openmp) and clAmdBlas and there I encounter a difficulty:
I use the function clAmdBlasDtrsm() and compare with dtrsm() from blas and all is ok with up to around 7500 right hand sides.
I testet also the found solutions and they were equal.
When I try more (e.g 10000 rhs) I get a failure -1021 which I cannot find in cl.h and not in clAmdBlas.h (and also nowhere else).
The original blas-dtrsm() has no problems.
Could you help me? What says this error code? What is the reason for the failure?
The important part (as I think) of the code is following:
bufferA = clCreateBuffer(ctx, CL_MEM_READ_ONLY, N * lda*sizeof(*AA), NULL, &err); bufferX = clCreateBuffer(ctx, CL_MEM_READ_WRITE, m*n * sizeof(*BB), NULL, &err); err = clEnqueueWriteBuffer(queue, bufferA, CL_TRUE, 0, N * lda * sizeof(*AA), AA, 0, NULL, NULL); err = clEnqueueWriteBuffer(queue, bufferX, CL_TRUE, 0, m*n * sizeof(*BB), BB, 0, NULL, NULL); err = clAmdBlasDtrsm(order, side, uplo, transA, diag, m, n, alpha, bufferA, lda, bufferX, ldb, numcommandqueues, &queue, 0, NULL, &event);
Thanks for your help in advance