cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

wolf0
Adept I

14.9 and higher OpenCL performance

I've tried dozens of tweaks to my Groestl implementation, and it refuses to get anywhere near as fast as pre-14.9 code. The pre-14.9 code is just butchered on 14.9 - a rewrite makes it better, but the performance is still dismal. Are the changes in 14.9+ permanent, meaning I have to not upgrade for the forseeable future?

0 Likes
11 Replies
dipak
Big Boss

Sorry, I'm unable to follow your question. Do you mean to say performance of your code on catalyst driver 14.9 (and higher) is slower that older ones?

0 Likes

Yes - by a rather large amount. A rewrite helps, but I can't seem to restore it to anywhere near the original speed.

0 Likes

Can you please provide a sample test case such that we can reproduce the issue at our end and if required, forward to the concerned team?

Regards,

0 Likes

The code here shows pretty much the exact same issue: https://dl.dropboxusercontent.com/u/40353042/Diamond/groestlcoin-v1.cl

0 Likes

Thanks for sharing the link. However, only kernel code is available there. A corresponding host code that calls the kernel is required to run and analyze it. I'm not familiar with the code. Can you please provide the corresponding host code? Please also mention your setup details.

Regards,

0 Likes

Sorry, that code is for SGMiner 5 - an altcoin mining application. The host code is on GitHub, here: https://github.com/sgminer-dev/sgminer - the file shown should replace kernel/groestlcoin.cl in the source tree.

For the exact command line/settings, I might need the GPU it's going to run on.

As for my setup, I have almost every GCN card - well, all the chips, at least. 2x270X, 3x7950, 3x280X, 1x285, and 3x290X. Same performance drop on every one.

EDIT: If you want settings for all the cards I have, just let me know.

0 Likes

Hi,

I'm just curious. Does the Groest algo has big memory footprint (just like Litecoin)?

0 Likes

No, not in the same way. Litecoin's algorithm uses a scratchpad that cannot fit in LDS, so it goes in global memory - Groestl, at least these implementations, use rather large lookup tables, but they do fit in LDS.

0 Likes

Finally I got some free time and I thought I'd like to try myself optimizing this algo in asm.

Maybe I can beat OpenCL, or even if not, then learn new things.

But I need some help to start: Would you please send me a complete test_case for your kernel?

- the kernel source

- the global/local dimensions of the kernel

- all the kernel input parameters. It must be a case when it actually finds a GroestCoin hash.

I need your help as I'm too lame/lazy to set up a working C environment and the sg-miner is also a complicated one. I only wan't to fiddle with the kernel itself, and that's why I need a good test_case.

0 Likes
nan
Adept II

Hi,

could you give some performance estimates? My R9 290@stock delivers ~24 Mhashes/s (1 hash basically evaluates PERM_BIG_P(), PERM_BIG_Q() and PERM_BIG_P() again) with my own kernel using Catalyst 14.9.

-- NaN

0 Likes

On 14.9, the kernel I linked has the speed cut by over half - it was around 26MH/s before 14.9.

0 Likes