Archives Discussions

enerc77 · ‎12-14-2015

Hi,

I'am a BOINC developer (mainly einstein@home) and I'am evaluating the benefits of Kaveri HSA for astronomy research algorithms. We already have OpenCL implementation, but HSA should bring significant improvement for memory transferts and synchronization.

I was so far unable to pass the vector_copy test. Here is my config: A10 7870K (Godavari), Debian 4.2 kernel, A78 chipset, IOMMU enabled. kfd_check_installation.sh PASS.

I tried compiled HSA-Runtime-Reference-Source, hsa-runtime_1.0.3_amd64.deb or gabbayo runtime. Either I'am stucked in hsa_queue_create (returns an error), or I can go up to "Determine the agents ISA", where hsa_agent_get_info returns an error.

Can't see any helping error in dynamic kernel debug logs.

So my question: which kernel should I use, and which HSA runtime?

Thanks,

pinform · ‎12-14-2015

White-listed you, so you can post in any of the AMD Developer forums.

Moved post to the appropriate developer forum.

--Prasad

bsp2020 · ‎12-15-2015

You should stick with files form HSA Foundation · GitHub if you want to work with HSA. I don't think upstream kernel is compatible with run-time/libraries released on github yet. So, instead of using Debian 4.2 kernel, try HSAFoundation/HSA-Drivers-Linux-AMD · GitHub

Brian

enerc77 · ‎01-03-2016

Thanks bsp2020.

By reorganizing the algorithms (some computations faster in CPU), I was able to get 25% improvement over Catalyst/ OpenCL1.2 code. Not bad at all!

It's more than the vector_copy example, since there is about 2000 lines of OpenCL code in those algorithms.

I found some problems:

- The 2D does not seem to work as get_global_id(1) was always returning 0. Wasn't a big issue to convert it to 1D, since max dimension is very big

- hsa_signal_wait_acquire does not always work.... I used HSA_ENABLE_INTERRUPT=no to prevent freeze. But frequenty, the program was waiting doing nothing. Sometimes, I was lucky it could get it run 15 mins, but sometimes was stopping within a few seconds. So I went to this workaround:

while (ret = hsa_signal_load_acquire(signal) == 1) { nanosleep(&t,NULL); }

with 100µs delay.

That workaround always work. I could have the program running for hours without a glitch.

What I found very nice in HSA, is if you make a mistake in your kernels and you overflow, this is trapped by signal 11 instead of corrupting the graphic memory. This saves hours of debugging when you can't find it in CPU emulation.

Just now waiting for libraries to be released for stock kernel, and let's go crunching....

Archives Discussions

Which HSA runtime when using stock Linux kernel?