1 Reply Latest reply on Jun 2, 2017 11:57 AM by thornhale

    What is the most robust approach to deep learning on AMD hardware?

    thornhale

      Hi community and developers,

       

      I will be purchasing an AMD Ryzen 5 1600 CPU and associated hardware within 2-5 months. The question is whether it will be feasable to purchase a VEGA-based GPU for DL applications or whether NVIDIA is the way to go. NVIDIA with cuDNN is currently the defacto answer. All major DL frameworks have developed GPU compute using that library. This simplicity makes the choice of an NVIDIA card very appealing.

       

      For AMD, there are as far as I can see 3 possible paths forward:

       

      1.) DL Framework (e.g. Tensorflow) with openCl 1.2 support

      2.) DL Framework with openCl 2.2 support

      3.) MIOpen + rocM

       

      My current evaluation:

       

      1.) The use/development of openCl 1.2 solutions has tepidly proceeded since 2015. Tensorflow, torch and caffe have some limited, early-stage development. Tensorflow is still being developed. Other frameworks have stopped further support. In all cases, the code is not highly optimized.

      2.) I have not seen any one develop openCl 2.2 based DL solutions for any of the DL frameworks. So this is not an option at this time.

      3.) There is talk about MIOpen to be the cuDNN equivalent with the goal of release in H1 of 2017. The problem: I have not seen any such libraries anywhere. Therefore, it is questionable if/when MIOpen will be incorporated into TF, TH, torch, caffe etc...

       

      My questions are simple:

       

      • Given limited time, I am trying to determine what will be the most robust way to pursue deep learning on AMD hardware. Should I invest time trying to get Tensorflow to work with openCl?
      • Will my time better be used waiting for MIOpen? (Option 2 does not seem to work at all at this time.) Will these options be available within the above mentioned time-frames?
      • Is this picture likely to change between now and July or August. The thought of using an APU+additional dGPU is very appealing, but without software support I cannot foresee myself going with AMD GPUs.
      • Or perhaps this is one of those use-case dependent cases. In which case I am wondering: When is it better to use HIP vs HCC vs Opencl? And how does miOpen fit in in this case?

       

       

      For reference, I was recently successful in compiling Tensorflow-opencl for my A10-7850K as you can read here. I shared my insight with other people on stackoverflow. As mentioned in the links mentioned earlier, the performance using openCl 1.2 has been a bit disappointing on my current setup:

       

      Following are some numbers for calculating 1 epoch using the CIFAR10 data set for MY SETUP (A10-7850 with iGPU). Your mileage will almost certainly vary!

       

      • Tensorflow (via pip install): ~ 1700 s/epoch
      • Tensorflow (w/ SSE + AVX): ~ 1100 s/epoch
      • Tensorflow (w/ opencl & iGPU): ~ 5800 s/epoch

       

      I attribute the above to the following factors:

       

      • The iGPU only has 1GB. This leads to a lot of copying back and forth between CPU and GPU. (Opencl 1.2 does not have the ability to data pass via pointers yet; instead data has to be copied back and forth.)
      • The iGPU only has 512 stream processors and 32 GB/S memory bandwidth which in this case is slower than 4 CPUs using SSE4 + AVX instruction sets.
      • The development of tensorflow-opencl is in it's beginning stages, and a lot of optimizations in SYCL etc. have not been done yet.

       

      You can see that in this particular case performance is worse (about 5X worse). It may be a bit simplistic, but the iGPU has about 1/8 the performance of an RX 480. If I extrapolate a bit, I could divide 5800/8 ~ 700 s/epoch, which is still only slightly faster than a CPU. I mention tensorflow specifically because it appears to be most developed compared to the other frameworks in terms of opencl support. With an NVIDIA GPU, the calculations currently would complete much faster.

       

      Regardless of the current performance, what is the realistic path forward to get much better performance on AMD hardware within the next 2-5 months?

       

      P.S.: Given the growth of Machine Learning/Deep Learning should there not be a separate forum for this?

      P.P.S.: I am also looking for a forum/support group for exploration of ROCM (and in the future miOpen). Can someone point me to any such things? I have not found a central place where people who are experimenting with these technologies gather and talk about challenges and/or solutions yet.

       

      Message was edited by: Anthony Le Spelling errors, some memory bandwidth details added. Added PS

       

      Message was edited by: Anthony Le Added PPS.

        • Re: What is the most robust approach to deep learning on AMD hardware?
          thornhale

          The absolute lack of awareness and response on this and related topic from the community, community managers, and/or devs have led me to pick up an NVIDIA 1080TI card at this time. Here is some feedback to consider for the future:

           

          1.) Although machine learning can be done in opencl (albeit not optimally), machine learning and especially deep learning does NOT equal opencl. As people who research AMD products are probably aware of, AMD is building its own stack (e.g.: RocM and miOpen). Therefore, it is an error on the part of the community managers to put this topic here.

          2.) I will again strongly suggest that in preparation for the release of machine learning applications on AMD hardware a separate channel/forum be setup for this purpose. Do NOT lump machine learning with opencl!

          3.) In spite of potentials NDAs, it would have been nice for some developer in this realm to post anything helpful that is NDA-compatible on this topic. As it stands, it seems like neither the AMD community, nor the moderators, or the devs/researchers are even aware of what machine learning/deep learning is. I say it seems so but I am of course assuming that the devs and researchers at AMD do know very well what they are doing - and probably much better than I. I am just appealing to this community to be more (pro)active in this regard.

           

          I still want AMD to succeed and will continue to monitor activities concerning this topic on this forum. I still have a Kaveri APU with which I could experiment a bit. In this case, the time lines just did not match up.