I have a question, but the sample codes in the following blog article are still available?

I am developing a linear equation solver on AMD gpu with clBLAS.

But, currently 90% of computation time is used by Ddot in clBLAS, and I need a faster dot product routine.

Or, if there is a good library please let me know.