I've just posted a new blog on Merkle tree hashing with SHA256 at its core using OpenCL 2.0.
In it, I talk about why SHA256 is not suitable for GPU compute and how we can implement Merkle tree hashing on GPU for big files and get very good throughput. Please check it out if you have a minute: http://developer.amd.com/community/blog/2015/05/29/merkle-tree-hashing-using-opencl-2-0/
Interesting. I know from experience yescrypt also employs an initial key setup phase which is 32-way if memory serves. To be honest, I'm not sure this truly applies to yescrypt in general as I worked on the yescrypt implementation used by "bsty" crypto currency.
Nonetheless, if it's good for solar designer it's good for me.
Please add data points on the bar chart.
Ravi can jump in, but in this case we don't have anything public. He and I talked about that. Here's the story behind the story.
The blog is based on work done for an internal technical conference. AMD engineers do neat stuff, then share it with each other, for cross-pollination and learning. I reviewed the various projects and presentations from that conference, identified this one as being really interesting, and prevailed upon Ravi and his management to get the blog done. HOWEVER... being an internal project, the bottom line is that we'd have to do some significant work to clean up the code before we could release it to the public. Engineering priorities meant... not going to happen. My choice, as the guy in charge of finding and publishing this stuff, was... either nothing, or publish something without source code. I chose the latter. Not my ideal, whenever we do something I want source code because I know you want source code. But, this was a "better than nothing" call.
If you can live with hashing 80 byte blocks with only the last 4 bytes changing I've published a cryptocurrency miner under MIT. It's pretty much the opposite of what's going on here but you can still get to look at various hashing algorithms. Some hashing algorithms have interesting properties.
AFAIK, those are the only kernels for GCN architecture written in "GPU style" (multiple SIMD lanes cooperating through LDS) instead of just throwing CPU code at the thing - I'm not claiming them to be the best however. I'm inclined to believe the host code is the cleanest around and I wrote the kernel code for readability without compromising performance too much as well.
So it's a different thing but if you want to get your feet wet that's something you can build and mess with.