4 Replies Latest reply on May 2, 2013 8:45 PM by void_ptr

    BLAS for Bolt?


      clMagma has BLAS, but the API uses host CL objects. Any prospect for getting BLAS functions wrapped in a nice BOLT API?  Is there a technical reason why this couldn't or shouldn't be done? AMD willing to accept open-source contributions to BOLT providing this?


        • Re: BLAS for Bolt?

          HI void_ptr,

          AMD does have a clAMDBlas library. Follow this link.


          Any prospect for getting BLAS functions wrapped in a nice BOLT API?

          >>> Currently we don't have any plans as such. BOLT is definitely planned to be user friendly.

          AMD willing to accept open-source contributions to BOLT providing this?

          >>>We are willing to accept open source contributions with respect to the functions in Standard Template library. But regarding BLAS and LAPACK libraries. It could be better if it is in a different namespace rather than the bolt:: namespace.



          1 of 1 people found this helpful
            • Re: BLAS for Bolt?

              Thanks for the reply.


              The one thing I would say is: I think GPU acceleration makes the biggest bang in numerical methods, such as those involving matrix multiplies. I'm a bit disappointed to hear that there are no plans to work towards some way of exploiting GPUs for these kinds of algorithms, that hides the tedious opencl host-API, in the same way that BOLT hides it for things it does.

                • Re: BLAS for Bolt?

                  I would like to reiterate that AMD has a BLAS library. It can be downloaded from the below link.

                  AMD APPML as mentioned in my previous post.

                  It provides OpenCL acceleration and is user friendly, by hiding most of the OpenCL calls inside the library.


                    • Re: BLAS for Bolt?

                      Hi Rbanger,

                      Yes, I understood that. My point is this. The clAmdBLAS has an API involving host opencl objects, e.g.:


                      clAmdBlasStatus clAmdBlasSgemmEx ( clAmdBlasOrder order, clAmdBlasTranspose

                      transA, clAmdBlasTranspose transB, size_t M, size_t N, size_t K, cl_float

                      alpha, const cl_mem A, size_t offA, size_t lda, const cl_mem B, size_t offB,

                      size_t ldb, cl_float beta, cl_mem C, size_t offC, size_t ldc, cl_uint

                      numCommandQueues, cl_command_queue * commandQueues, cl_uint

                      numEventsInWaitList, const cl_event * eventWaitList, cl_event * events )


                      thus putting the burden on the programmer of creating and managing opencl buffers, command queues, events & cetera. This burden is what BOLT manages to hide from the programmer. I was merely pointing out that extending BOLT or some similar framework to be inclusive of linear algebra functionality where GPUs have the biggest benefit would be a big win. At least in my case. Maybe mine is exceptional, and not worth your effort. And I know AMD can contibutors can't do everything at once. Just wanted to be rightly understood.