Hi community and developers,
I will be purchasing an AMD Ryzen 5 1600 CPU and associated hardware within 2-5 months. The question is whether it will be feasable to purchase a VEGA-based GPU for DL applications or whether NVIDIA is the way to go. NVIDIA with cuDNN is currently the defacto answer. All major DL frameworks have developed GPU compute using that library. This simplicity makes the choice of an NVIDIA card very appealing.
For AMD, there are as far as I can see 3 possible paths forward:
1.) DL Framework (e.g. Tensorflow) with openCl 1.2 support
2.) DL Framework with openCl 2.2 support
3.) MIOpen + rocM
My current evaluation:
1.) The use/development of openCl 1.2 solutions has tepidly proceeded since 2015. Tensorflow, torch and caffe have some limited, early-stage development. Tensorflow is still being developed. Other frameworks have stopped further support. In all cases, the code is not highly optimized.
2.) I have not seen any one develop openCl 2.2 based DL solutions for any of the DL frameworks. So this is not an option at this time.
3.) There is talk about MIOpen to be the cuDNN equivalent with the goal of release in H1 of 2017. The problem: I have not seen any such libraries anywhere. Therefore, it is questionable if/when MIOpen will be incorporated into TF, TH, torch, caffe etc...
My questions are simple:
- Given limited time, I am trying to determine what will be the most robust way to pursue deep learning on AMD hardware. Should I invest time trying to get Tensorflow to work with openCl?
- Will my time better be used waiting for MIOpen? (Option 2 does not seem to work at all at this time.) Will these options be available within the above mentioned time-frames?
- Is this picture likely to change between now and July or August. The thought of using an APU+additional dGPU is very appealing, but without software support I cannot foresee myself going with AMD GPUs.
- Or perhaps this is one of those use-case dependent cases. In which case I am wondering: When is it better to use HIP vs HCC vs Opencl? And how does miOpen fit in in this case?
For reference, I was recently successful in compiling Tensorflow-opencl for my A10-7850K as you can read here. I shared my insight with other people on stackoverflow. As mentioned in the links mentioned earlier, the performance using openCl 1.2 has been a bit disappointing on my current setup:
Following are some numbers for calculating 1 epoch using the CIFAR10 data set for MY SETUP (A10-7850 with iGPU). Your mileage will almost certainly vary!
- Tensorflow (via pip install): ~ 1700 s/epoch
- Tensorflow (w/ SSE + AVX): ~ 1100 s/epoch
- Tensorflow (w/ opencl & iGPU): ~ 5800 s/epoch
I attribute the above to the following factors:
- The iGPU only has 1GB. This leads to a lot of copying back and forth between CPU and GPU. (Opencl 1.2 does not have the ability to data pass via pointers yet; instead data has to be copied back and forth.)
- The iGPU only has 512 stream processors and 32 GB/S memory bandwidth which in this case is slower than 4 CPUs using SSE4 + AVX instruction sets.
- The development of tensorflow-opencl is in it's beginning stages, and a lot of optimizations in SYCL etc. have not been done yet.
You can see that in this particular case performance is worse (about 5X worse). It may be a bit simplistic, but the iGPU has about 1/8 the performance of an RX 480. If I extrapolate a bit, I could divide 5800/8 ~ 700 s/epoch, which is still only slightly faster than a CPU. I mention tensorflow specifically because it appears to be most developed compared to the other frameworks in terms of opencl support. With an NVIDIA GPU, the calculations currently would complete much faster.
Regardless of the current performance, what is the realistic path forward to get much better performance on AMD hardware within the next 2-5 months?
P.S.: Given the growth of Machine Learning/Deep Learning should there not be a separate forum for this?
P.P.S.: I am also looking for a forum/support group for exploration of ROCM (and in the future miOpen). Can someone point me to any such things? I have not found a central place where people who are experimenting with these technologies gather and talk about challenges and/or solutions yet.
Message was edited by: Anthony Le Spelling errors, some memory bandwidth details added. Added PS
Message was edited by: Anthony Le Added PPS.