Hi community and developers,
I will be purchasing an AMD Ryzen 5 1600 CPU and associated hardware within 2-5 months. The question is whether it will be feasable to purchase a VEGA-based GPU for DL applications or whether NVIDIA is the way to go. NVIDIA with cuDNN is currently the defacto answer. All major DL frameworks have developed GPU compute using that library. This simplicity makes the choice of an NVIDIA card very appealing.
For AMD, there are as far as I can see 3 possible paths forward:
1.) DL Framework (e.g. Tensorflow) with openCl 1.2 support
2.) DL Framework with openCl 2.2 support
3.) MIOpen + rocM
My current evaluation:
1.) The use/development of openCl 1.2 solutions has tepidly proceeded since 2015. Tensorflow, torch and caffe have some limited, early-stage development. Tensorflow is still being developed. Other frameworks have stopped further support. In all cases, the code is not highly optimized.
2.) I have not seen any one develop openCl 2.2 based DL solutions for any of the DL frameworks. So this is not an option at this time.
3.) There is talk about MIOpen to be the cuDNN equivalent with the goal of release in H1 of 2017. The problem: I have not seen any such libraries anywhere. Therefore, it is questionable if/when MIOpen will be incorporated into TF, TH, torch, caffe etc...
My questions are simple:
For reference, I was recently successful in compiling Tensorflow-opencl for my A10-7850K as you can read here. I shared my insight with other people on stackoverflow. As mentioned in the links mentioned earlier, the performance using openCl 1.2 has been a bit disappointing on my current setup:
Following are some numbers for calculating 1 epoch using the CIFAR10 data set for MY SETUP (A10-7850 with iGPU). Your mileage will almost certainly vary!
I attribute the above to the following factors:
You can see that in this particular case performance is worse (about 5X worse). It may be a bit simplistic, but the iGPU has about 1/8 the performance of an RX 480. If I extrapolate a bit, I could divide 5800/8 ~ 700 s/epoch, which is still only slightly faster than a CPU. I mention tensorflow specifically because it appears to be most developed compared to the other frameworks in terms of opencl support. With an NVIDIA GPU, the calculations currently would complete much faster.
Regardless of the current performance, what is the realistic path forward to get much better performance on AMD hardware within the next 2-5 months?
P.S.: Given the growth of Machine Learning/Deep Learning should there not be a separate forum for this?
P.P.S.: I am also looking for a forum/support group for exploration of ROCM (and in the future miOpen). Can someone point me to any such things? I have not found a central place where people who are experimenting with these technologies gather and talk about challenges and/or solutions yet.
Message was edited by: Anthony Le Spelling errors, some memory bandwidth details added. Added PS
Message was edited by: Anthony Le Added PPS.
The absolute lack of awareness and response on this and related topic from the community, community managers, and/or devs have led me to pick up an NVIDIA 1080TI card at this time. Here is some feedback to consider for the future:
1.) Although machine learning can be done in opencl (albeit not optimally), machine learning and especially deep learning does NOT equal opencl. As people who research AMD products are probably aware of, AMD is building its own stack (e.g.: RocM and miOpen). Therefore, it is an error on the part of the community managers to put this topic here.
2.) I will again strongly suggest that in preparation for the release of machine learning applications on AMD hardware a separate channel/forum be setup for this purpose. Do NOT lump machine learning with opencl!
3.) In spite of potentials NDAs, it would have been nice for some developer in this realm to post anything helpful that is NDA-compatible on this topic. As it stands, it seems like neither the AMD community, nor the moderators, or the devs/researchers are even aware of what machine learning/deep learning is. I say it seems so but I am of course assuming that the devs and researchers at AMD do know very well what they are doing - and probably much better than I. I am just appealing to this community to be more (pro)active in this regard.
I still want AMD to succeed and will continue to monitor activities concerning this topic on this forum. I still have a Kaveri APU with which I could experiment a bit. In this case, the time lines just did not match up.