cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mrrvlad
Adept I

Re: Feedback discussion: How is AMD doing for developers?

Thanks for asking,

I can only make comments about my limited experience with OpenCL on GCN cards coming from CUDA. 

Kudos: very good hardware that is easy to optimize for most of the time.

Issues: OpenCL compiler is not an "optimizing" one - there is no way to control or optimize register usage, so multiple times I was in a situation when a kernel compiles using 60 registers on V7 card, and 65 registers on a V6 card. This was causing occupancy difference and performance difference of 20%, that was sometimes not easy to fix, since one can only guess where the registers are being used. It should not be developer's job to hunt for these small optimizations every time he makes a change - Nvidia solves it with maxrregcount parameter. And in general it always seems like kernel uses more registers than it "should" by looking at the code.

Suggestion: if AMD is serious about GPU compute, it should make an investment into compiler and tools to make them easier to develop for.

0 Likes
jtrudeau
Staff
Staff

Re: Feedback discussion: How is AMD doing for developers?

Vladimir Tankovich

Thanks for the feedback. I cannot confirm/discuss future roadmaps and what we may/may not do with respect to investing in tools. But this is most excellent feedback. The good news is, I know PRECISELY where to send the input on optimizing the compiler. In addition, the coming HSA tools and compiler are open source, so that should help.

Thanks again.

0 Likes
mrrvlad
Adept I

Re: Feedback discussion: How is AMD doing for developers?

I'd be happy to provide a more detailed feedback and several other (smaller) issues with compiler to interested parties and spend some time iterating if needed.

Please contact me directly for this.

one of the smaller issue is function inlining - OpenCL "should" be inlining functions, but in fact when I copy/paste function code into the location, register count decreases by a few registers and perf also decreases slightly.

0 Likes
boxerab
Challenger

Re: Feedback discussion: How is AMD doing for developers?

I really like your idea for a "Board Farm".  I am currently developing my app on an HD 7700 - it would be very useful to see performance

on a newer, more powerful card such as the 290x. 

0 Likes
sharpneli
Journeyman III

Re: Feedback discussion: How is AMD doing for developers?

Hi,

Kudos:

Really love how you're pushing low level API's, even if they are still not available for public use.

Quite easy to get 50% or so device utilization in OpenCL.

Complaints:

Nigh impossible to go over 50%: Register usage! Already mentioned before but it's really painful to tune performance when the compiler goes "It's better to save 4 clocks on recomputing this value than use one register less and thus increase occupancy!". A good register rematerialization pass which would take the device occupancy into account is a must! Even if the HSA compiler is going to be opensource the backend which does register allocation for GCN is probably going to be closed, so we cannot do this ourselves.

Linux drivers. They always require kernel/Xorg around 9 months old. It's absolutely impossible to keep up to date. Your biggest competition on GPU arena is not perfect but still far far better. And with great Linux drivers they were able to push new GPU on Android as it actually uses the same driver stack as their desktop Linux drivers. If you ever want to go into mobile space you really need to get this thing fixed. I mean seriously. Plug in a GCN based GPU into your new fancy ARM based CPU's and voila, you have an Android Tablet SOC! Also whereas your competition Linux drivers are around 20% faster in OpenGL than their Windows drivers (Due to no WDDM hampering it) your Linux drivers are maybe 20% slower than the Windows drivers.

In addition to use OpenCL you must have X running, which is kinda ridiculous. How can you tell to a customer that if they spend few millions on a massive FirePro cluster that they have to have X server on?

Suggestions:

Employ one or two engineers full time to work on just keeping your Linux drivers up to date. It seems currently it's a separate project, as in "Implement support for latest Ubuntu" and then completely abandoned until new iteration of $popular_distro comes out. It's relatively low intensity work so you could have a working driver out few days after new kernel/xorg release.

Focus on Ocl 2.0 really hard. To get to par with current CUDA one really must have dynamic parallelism and the work_group_reduce/broadcast functions in shape using the GCN register shuffles internally.

0 Likes
nan
Adept II

Re: Feedback discussion: How is AMD doing for developers?

Hi,

Kudos: The quality of the drivers and OpenCL support increased in the last years and there is at least some documentation available!

  • The OpenCL compiler has bugs. It's not possible for normal people to submit bug reports. Other people mentioned before that the register allocation of the OpenCL compiler is not optimal, i.e. the compiler often does not reuse registers and large kernels have the tendency to use many registers. Limiting the amount of available registers manually seems to be a good idea because then the compiler can output code for a machine with n registers. My guess is that finding a good value for n automatically is difficult.
  • Better documentation (of compute capabilities): Sometimes AMD documents have varying quality, i.e. a whole section in the R1100 ISA Book was copied from a document describing a VLIW architecture (this made no sense at all) and the OpenCL Programming Guide contains optimizing hints for CPUs in the Southern Island chapter. Also these non-technical documents often contradict other AMD documents. It would be helpful if the compute capabilities would be better documented i.e. the throughput of 64bit integer operations in a meaningful manner like # of operations/cycle/PE or its inverse for better readability. The Programming Guide lacks some information about global memory access in Southern Island. The architecture of Hawaii isn't documented at all and the function of ACEs isn't described, too (only some marketing bs). I randomly found a post in the forum, which states that command queues are assigned consecutively to ACEs.
  • Sorry, but the fglrx driver architecture is broken and requires an older X server. Additionally, the OS often tries to load the wrong OpenGL libraries (or other libraries) after a system update and reinstalling fglrx does not help. Other AMD tools seem to be badly tested on Linux before their release and do not work at all.
  • The OpenCL performance of new drivers regressed in the past (even in the last few month e.g. with Catalyst 14.4). That should not happen with such severity after years of development.

-- NaN

0 Likes
jtrudeau
Staff
Staff

Re: Feedback discussion: How is AMD doing for developers?

Captured. Thanks for taking the time. I'm still gathering info, but about to start seeding the various developer teams with this feedback.

0 Likes
atcl
Adept I

Re: Feedback discussion: How is AMD doing for developers?

Hi,
here is something quick and easy: on the ACML download page, the userguide linked there is still for version 5.3, yet with the versions >=6.0 an updated userguide is part of the download; could the updated userguide also be linked on the download page? Also, on the matter of the ACML userguide, it would be helpful if the pdf would contain (hyper-) links ie for the table of contents or other references.

Thank you

0 Likes
jtrudeau
Staff
Staff

Re: Feedback discussion: How is AMD doing for developers?

Yeah, that should have already happened. I think we noticed that a week or two ago. Thanks for applying the boot. Let me go ping the people responsible. Kinda fell off my radar after I alerted them to the problem.

0 Likes
jtrudeau
Staff
Staff

Re: Feedback discussion: How is AMD doing for developers?

And fixed. THANKS!