Capture.JPG

 

[Originally posted on 09/08/17 by Albert J. De Vera]

 

Deep Learning, an advanced form of machine learning, has generated a lot of interest due to the wide range of applications on complex data sets. Current technologies and the availability of very large amounts of complex data have made analytics on the latter more tractable.

 

With deep neural networks as basis for deep learning algorithms, GPUs are now being used in deep learning applications because they provide many processing units. These processing units simulate a neural network that does the computation on data. Neural networks can therefore scale and improve the extraction of information from data.

 

ROCm and The AMD Deep Learning Stack

The AMD Deep Learning Stack is the result of AMD’s initiative to enable DL applications using their GPUs such as the Radeon Instinct product line. Currently, deep learning frameworks such as Caffe, Torch, and TensorFlow are being ported and tested to run on the AMD DL stack. Supporting these frameworks is MIOpen, AMD’s open-source deep learning library built for the Radeon Instinct line of compute accelerators.

 

AMD’s ROCm platform serves as the foundation of this DL stack. ROCm enables the seamless integration of the CPU and GPU for high performance computing (HPC) and ultra-scale class computing. To achieve this, ROCm is built for language independence and takes advantage of the Heterogenous System Architecture (HSA) Runtime API.3 This is the basis of the ROCr System Runtime, a thin user-mode API providing access to graphics hardware driven by the AMDGPU driver and the ROCk kernel driver.

 

1.jpg

 

For now, OS support for ROCm is limited to Ubuntu 14.04, Ubuntu 16.04, and Fedora 23. For these OSs, AMD provides a modified Linux version 4.6 kernel with patches to the HSA kernel driver (amdkfd) and the AMDGPU (amdgpu) kernel driver currently in the mainline Linux kernel.5

 

Using Docker With The AMD Deep Learning Stack

 

Docker Containers

Software containers isolate the application and its dependencies from other software installed on the host. They abstract the underlying operating system while keeping its own resources (filesystem, memory, CPU) and environment separate from other containers.

 

In contrast to virtual machines, all containers running on the same host share a single operating system without the need to virtualize a complete machine with its own OS. This makes software containers perform much faster than virtual machines because of the lack of overhead from the guest OS and the hypervisor.

 

Docker is the most popular software container platform today. It is available for Linux, macOS, and Microsoft Windows. Docker containers can run under any OS with the Docker platform installed.6

 

Installing Docker and The AMD Deep Learning Stack

The ROCm-enabled Linux kernel and the ROCk driver, together with other needed kernel modules, must be installed on all hosts that run Docker containers. This is because the containers do not have the kernel installed inside them. Instead, the containers share the host kernel.7

 

The installation procedure described here is for Ubuntu 16.04. Ubuntu 16.04 is currently the most tested OS for ROCm.

 

Installing ROCm

The next step is to install ROCm and the ROCm kernel on each host. The procedure described below is based on instructions found in https://rocm.github.io/install.html.

 

Grab and install the GPG key for the repository:

wget -qO – http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add –

 

You should get the message ‘OK’. You can check if it’s there using apt-key:

apt-key list

 

In /etc/apt/sources.list.d, create a file named rocm.list and place the following line in it:

deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main

 

Update the repository information by running ‘apt update’. If you get a warning because of the key signature, you may ignore it since the repository administrator will update this in the future.

 

Install the ROCm Runtime software stack using ‘apt install rocm’:

 

[root@pegasus ~]# apt install rocm

Reading package lists… Done

Building dependency tree

Reading state information… Done

 

 

The following packages were automatically installed and are no longer required:

hcblas hcfft hcrng miopengemm

Use ‘sudo apt autoremove’ to remove them.

The following additional packages will be installed:

hcc hip_hcc linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148 linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148 rocm-dev

rocm-device-libs rocm-profiler rocm-smi rocm-utils

 

 

Suggested packages:

linux-firmware-image-4.11.0-kfd-compute-rocm-rel-1.6-148

 

 

The following NEW packages will be installed:

hcc hip_hcc linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148 linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148 rocm rocm-dev

rocm-device-libs rocm-profiler rocm-smi rocm-utils

0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.

Need to get 321 MB of archives.

After this operation, 1,934 MB of additional disk space will be used.

Do you want to continue? [Y/n]

Get:1 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 rocm-utils amd64 1.0.0 [30.7 kB]

Get:2 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 hcc amd64 1.0.17312 [255 MB]

Get:3 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 hip_hcc amd64 1.2.17305 [876 kB]

Get:4 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148 amd64 4.11.0-kfd-compute-rocm-rel-1.6-148-1 [10.8 MB]

Get:5 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148 amd64 4.11.0-kfd-compute-rocm-rel-1.6-148-1 [46.5 MB]

Get:6 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 rocm-device-libs amd64 0.0.1 [587 kB]

Get:7 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 rocm-smi amd64 1.0.0-25-gbdb99b4 [8,158 B]

Get:8 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 rocm-profiler amd64 5.1.6400 [7,427 kB]

Get:9 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 rocm-dev amd64 1.6.148 [902 B]

Get:10 http://repo.radeon.com/rocm/apt/debian xenial/main amd64 rocm amd64 1.6.148 [1,044 B]

Fetched 321 MB in 31s (10.1 MB/s)

Selecting previously unselected package rocm-utils.

(Reading database … 254059 files and directories currently installed.)

Preparing to unpack …/rocm-utils_1.0.0_amd64.deb …

Unpacking rocm-utils (1.0.0) …

Selecting previously unselected package hcc.

Preparing to unpack …/hcc_1.0.17312_amd64.deb …

Unpacking hcc (1.0.17312) …

Selecting previously unselected package hip_hcc.

Preparing to unpack …/hip%5fhcc_1.2.17305_amd64.deb …

Unpacking hip_hcc (1.2.17305) …

Selecting previously unselected package linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148.

Preparing to unpack …/linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148_4.11.0-kfd-compute-rocm-rel-1.6-148-1_amd64.deb …

Unpacking linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148 (4.11.0-kfd-compute-rocm-rel-1.6-148-1) …

Selecting previously unselected package linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148.

Preparing to unpack …/linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148_4.11.0-kfd-compute-rocm-rel-1.6-148-1_amd64.deb …

Unpacking linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148 (4.11.0-kfd-compute-rocm-rel-1.6-148-1) …

Selecting previously unselected package rocm-device-libs.

Preparing to unpack …/rocm-device-libs_0.0.1_amd64.deb …

Unpacking rocm-device-libs (0.0.1) …

Selecting previously unselected package rocm-smi.

Preparing to unpack …/rocm-smi_1.0.0-25-gbdb99b4_amd64.deb …

Unpacking rocm-smi (1.0.0-25-gbdb99b4) …

Selecting previously unselected package rocm-profiler.

Preparing to unpack …/rocm-profiler_5.1.6400_amd64.deb …

Unpacking rocm-profiler (5.1.6400) …

Selecting previously unselected package rocm-dev.

Preparing to unpack …/rocm-dev_1.6.148_amd64.deb …

Unpacking rocm-dev (1.6.148) …

Selecting previously unselected package rocm.

Preparing to unpack …/rocm_1.6.148_amd64.deb …

Unpacking rocm (1.6.148) …

Setting up rocm-utils (1.0.0) …

Setting up hcc (1.0.17312) …

Setting up hip_hcc (1.2.17305) …

Setting up linux-headers-4.11.0-kfd-compute-rocm-rel-1.6-148 (4.11.0-kfd-compute-rocm-rel-1.6-148-1) …

Setting up linux-image-4.11.0-kfd-compute-rocm-rel-1.6-148 (4.11.0-kfd-compute-rocm-rel-1.6-148-1) …

update-initramfs: Generating /boot/initrd.img-4.11.0-kfd-compute-rocm-rel-1.6-148

W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.

Generating grub configuration file …

Found linux image: /boot/vmlinuz-4.11.0-kfd-compute-rocm-rel-1.6-148

Found initrd image: /boot/initrd.img-4.11.0-kfd-compute-rocm-rel-1.6-148

Found linux image: /boot/vmlinuz-4.4.0-93-generic

Found initrd image: /boot/initrd.img-4.4.0-93-generic

Found memtest86+ image: /memtest86+.elf

Found memtest86+ image: /memtest86+.bin

done

Setting up rocm-device-libs (0.0.1) …

Setting up rocm-smi (1.0.0-25-gbdb99b4) …

Setting up rocm-profiler (5.1.6400) …

Setting up rocm-dev (1.6.148) …

Setting up rocm (1.6.148) …

KERNEL==”kfd”, MODE=”0666″

 

 

Reboot the server. Make sure that the Linux ROCm kernel is running:

 

Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.11.0-kfd-compute-rocm-rel-1.6-148 x86_64)

 

* Documentation: https://help.ubuntu.com

* Management: https://landscape.canonical.com

* Support: https://ubuntu.com/advantage

 

0 packages can be updated.

0 updates are security updates.

 

Test if your installation works with this sample program:

 

cd /opt/rocm/hsa/sample

make

./vector_copy

 

You should get an output similar to this:

 

Initializing the hsa runtime succeeded.

Checking finalizer 1.0 extension support succeeded.

Generating function table for finalizer succeeded.

Getting a gpu agent succeeded.

Querying the agent name succeeded.

The agent name is gfx803.

Querying the agent maximum queue size succeeded.

The maximum queue size is 131072.

Creating the queue succeeded.

“Obtaining machine model” succeeded.

“Getting agent profile” succeeded.

Create the program succeeded.

Adding the brig module to the program succeeded.

Query the agents isa succeeded.

Finalizing the program succeeded.

Destroying the program succeeded.

Create the executable succeeded.

Loading the code object succeeded.

Freeze the executable succeeded.

Extract the symbol from the executable succeeded.

Extracting the symbol from the executable succeeded.

Extracting the kernarg segment size from the executable succeeded.

Extracting the group segment size from the executable succeeded.

Extracting the private segment from the executable succeeded.

Creating a HSA signal succeeded.

Finding a fine grained memory region succeeded.

Allocating argument memory for input parameter succeeded.

Allocating argument memory for output parameter succeeded.

Finding a kernarg memory region succeeded.

Allocating kernel argument memory buffer succeeded.

Dispatching the kernel succeeded.

Passed validation.

Freeing kernel argument memory buffer succeeded.

Destroying the signal succeeded.

Destroying the executable succeeded.

Destroying the code object succeeded.

Destroying the queue succeeded.

Freeing in argument memory buffer succeeded.

Freeing out argument memory buffer succeeded.

Shutting down the runtime succeeded.

 

Installing Docker

We are installing the Docker Community Edition (also called Docker CE) on the host by using Docker’s apt repository. Our procedure is based on documentation published by Docker.8 There may be some slight differences from the original documentation. Note that the installation is done as the superuser. You can also use sudo to install Docker.

 

First, remove old versions of Docker:

apt remove docker docker-engine

 

If they are not installed, you will simply get a message that they are missing.

 

Install the following prerequisite packages using apt:

 

apt-transport-https

ca-certificates

curl

software-properties-common

 

Add the Docker GPG key to your host:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg |

sudo apt-key add –

The GPG fingerprint should be 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88. Use the command

 

apt-key fingerprint 0EBFCD88

to verify this.

 

Now add the repository information:

add-apt-repository \

“deb [arch=amd64] https://download.docker.com/linux/ubuntu \

$(lsb_release -cs) \

stable”

 

Finally, issue the command ‘apt update’.

 

Installing Docker CE should be done with ‘apt install docker-ce’. After the installation is complete, verify that Docker is properly configured and installed using the command ‘docker run hello-world’.

 

Running ROCm Docker Images

AMD provides a Docker image of the ROCm software framework.9 The image can be pulled from the official Docker repository:

 

sudo docker pull rocm/rocm-terminal

The image is about 1.5 GB in size and contains the necessary libraries to run ROCm-based applications. Create a container out of this image and look at the installed software in /opt/rocm:

 

sudo docker run -it –rm –device=/dev/kfd rocm/rocm-terminal

You can check for the ROCm libraries using ldconfig:

 

ldconfig -NXv

The command above should list all the libraries in the library path including the ROCm libraries.

 

The ROCm-docker source is available from GitHub:

 

mkdir ~/tmp

cd ~/tmp

git clone https://github.com/RadeonOpenCompute/ROCm-docker.git

Creating A ROCm Application Docker Image

We can use the rocm/rocm-terminal Docker image to build our own ROCm application Docker image. In the following examples, we use a couple of the sample applications that come with the

ROCm development package. One of them shall be /opt/rocm/hip/samples/1_Utils/hipInfo.

 

Assuming the host has the complete ROCm development tools, we just do the following:

cd /opt/rocm/hip/samples/1_Utils/hipInfo

make

 

The outcome of the make command shall be a binary called hipInfo.

 

If the compiler complains because of a missing shared library called libsupc++, we will need to install that somewhere in the host’s library path. In our case, we shall place the shared library in /usr/local/lib and make sure that ldconfig can find it. You can simply create a shared library from the installed static library /usr/lib/gcc/x86_64-linux-gnu/4.8/libsupc++.a:

 

 

mkdir -p ~/tmp/libsupc++

cd ~/tmp/libsupc++

ar x /usr/lib/gcc/x86_64-linux-gnu/4.8/libsupc++.a

ls -l *.o

gcc -shared -o libsupc++.so *.o

sudo cp -p libsupc++.so /usr/local/lib/

sudo ldconfig -v

Make sure that /usr/local/lib is seen by ldconfig. You may have to specify this directory in /etc/ld.so.conf.d if it is not found. Simply add a file named local_lib.conf with the line /usr/local/lib by itself.

 

 

Check the output of hipInfo by running it. You should get something like this (it will be slightly different from the literal output below depending on what type of GPU configuration you have):

 

 

$ ./hipInfo

compiler: hcc version=1.0.17312-d1f4a8a-19aa706-56b5abe, workweek (YYWWD) = 17312

——————————————————————————–

device# 0

Name: Device 67df

pciBusID: 1

pciDeviceID: 0

multiProcessorCount: 36

maxThreadsPerMultiProcessor: 2560

isMultiGpuBoard: 1

clockRate: 1303 Mhz

memoryClockRate: 2000 Mhz

memoryBusWidth: 256

clockInstructionRate: 1000 Mhz

totalGlobalMem: 8.00 GB

maxSharedMemoryPerMultiProcessor: 8.00 GB

totalConstMem: 16384

sharedMemPerBlock: 64.00 KB

regsPerBlock: 0

warpSize: 64

l2CacheSize: 0

computeMode: 0

maxThreadsPerBlock: 1024

maxThreadsDim.x: 1024

maxThreadsDim.y: 1024

maxThreadsDim.z: 1024

maxGridSize.x: 2147483647

maxGridSize.y: 2147483647

maxGridSize.z: 2147483647

major: 2

minor: 0

concurrentKernels: 1

arch.hasGlobalInt32Atomics: 1

arch.hasGlobalFloatAtomicExch: 1

arch.hasSharedInt32Atomics: 1

arch.hasSharedFloatAtomicExch: 1

arch.hasFloatAtomicAdd: 0

arch.hasGlobalInt64Atomics: 1

arch.hasSharedInt64Atomics: 1

arch.hasDoubles: 1

arch.hasWarpVote: 1

arch.hasWarpBallot: 1

arch.hasWarpShuffle: 1

arch.hasFunnelShift: 0

arch.hasThreadFenceSystem: 0

arch.hasSyncThreadsExt: 0

arch.hasSurfaceFuncs: 0

arch.has3dGrid: 1

arch.hasDynamicParallelism: 0

peers:

non-peers: device#0

memInfo.total: 8.00 GB

memInfo.free: 7.75 GB (97%)

 

Now that hipInfo is compiled and has been tested, let us create a Docker image with it. Create a directory for building an image with Docker.

 

mkdir ~/tmp/my_rocm_hipinfo

cd ~/tmp/my_rocm_hipinfo

 

Copy the necessary files for the Docker image to run properly:

 

cp -p /usr/local/lib/libsupc++.so . # If hipInfo needs this

cp -p /opt/rocm/hip/samples/1_Utils/hipInfo/hipInfo .

Create a file named Dockerfile in the current directory. It should contain this:

 

FROM rocm/rocm-terminal:latest

COPY libsupc++.so /usr/local/lib/

COPY hipInfo /usr/local/bin/

RUN sudo ldconfig

 

USER rocm-user

WORKDIR /home/rocm-user

ENV PATH “${PATH}:/opt/rocm/bin:/usr/local/bin”

 

ENTRYPOINT [“hipInfo”]

 

Build the Docker image:

 

sudo docker build -t my_rocm_hipinfo .

Create and run a container based on the new image:

 

 

sudo docker run –rm –device=”/dev/kfd” my_rocm_hipinfo

The device /dev/kfd is the kernel fusion driver. You should be getting a similar output as if you ran the hipInfo binary directly on the host.

 

 

Without the –rm parameter, the container will persist. You can then run the same container again and get some output:

 

 

sudo docker run –device=”/dev/kfd” –name nifty_hugle my_rocm_hipinfo

The Docker container shall persist:

 

 

sudo docker ps -a

You may get an output that looks like this:

 

 

Now, try this command and you should see the output from hipInfo again:

 

 

sudo docker start -i nifty_hugle

The second Docker image we shall create will contain the sample binary called vector_copy. The source is in /opt/rocm/hsa/sample. As done with hipInfo, use make to build the binary. Note that this binary also depends on the files with the .brig extension to run.

 

 

We do the following before we build the image:

 

 

mkdir ~/tmp/my_rocm_vectorcopy

cd ~/tmp/my_rocm_vectorcopy

mkdir vector_copy

cp -p /usr/local/lib/libsupc++.so . # Do this if necessary

cd vector_copy

cp -p /opt/rocm/hsa/sample/vector_copy .

cp -p /opt/rocm/hsa/sample/vector_copy*.brig .

cd .. # Back to ~/tmp/my_rocm_vectorcopy

For our Dockerfile, we have this:

 

 

FROM rocm/rocm-terminal:latest

COPY libsupc++.so /usr/local/lib/

RUN sudo mkdir /usr/local/vector_copy

COPY vector_copy/* /usr/local/vector_copy/

RUN sudo ldconfig

 

 

USER rocm-user

ENV PATH “${PATH}:/opt/rocm/bin:/usr/local/vector_copy”

 

 

WORKDIR /usr/local/vector_copy

ENTRYPOINT [“vector_copy”]

 

 

 

 

 

Building the Docker image for vector_copy should be familiar by now.

 

 

As an exercise, run the Docker image to see what output you get. Try with or without –rm and with the ‘docker start’ command.

 

 

 

 

 

For our last example, we shall use a Docker container for the Caffe deep learning framework. We are going to use the HIP port of Caffe which can be targeted to both AMD ROCm and Nvidia CUDA devices.10 Converting CUDA code to portable C++ is enabled by HIP. For more information on HIP, see https://github.com/ROCm-Developer-Tools/HIP.

 

 

 

 

 

Let us pull the hip-caffe image from the Docker registry:

 

 

docker pull intuitionfabric/hip-caffe

Test the image by running a device query on the AMD GPUs:

 

 

sudo docker run –name my_caffe -it –device=/dev/kfd –rm \

intuitionfabric/hip-caffe ./build/tools/caffe device_query -gpu all

You should get an output similar to the one below. Note that your output may differ due to your own host configuration.

I0831 19:05:30.814853 1 caffe.cpp:138] Querying GPUs all

I0831 19:05:30.815135 1 common.cpp:179] Device id: 0

I0831 19:05:30.815145 1 common.cpp:180] Major revision number: 2

I0831 19:05:30.815148 1 common.cpp:181] Minor revision number: 0

I0831 19:05:30.815153 1 common.cpp:182] Name: Device 67df

I0831 19:05:30.815158 1 common.cpp:183] Total global memory: 8589934592

I0831 19:05:30.815178 1 common.cpp:184] Total shared memory per block: 65536

I0831 19:05:30.815192 1 common.cpp:185] Total registers per block: 0

I0831 19:05:30.815196 1 common.cpp:186] Warp size: 64

I0831 19:05:30.815201 1 common.cpp:188] Maximum threads per block: 1024

I0831 19:05:30.815207 1 common.cpp:189] Maximum dimension of block: 1024, 1024, 1024

I0831 19:05:30.815210 1 common.cpp:192] Maximum dimension of grid: 2147483647, 2147483647, 2147483647

I0831 19:05:30.815215 1 common.cpp:195] Clock rate: 1303000

I0831 19:05:30.815219 1 common.cpp:196] Total constant memory: 16384

I0831 19:05:30.815223 1 common.cpp:200] Number of multiprocessors: 36

 

 

 

Let us now run Caffe in a container. We begin by creating a container for this purpose.

 

 

 

 

 

sudo docker run -it –device=/dev/kfd –rm intuitionfabric/hip-caffe

Run the MNIST example in the container. Once the above command is executed, you should be inside the container.

First, get the raw MNIST data:

 

 

./data/mnist/get_mnist.sh

Make sure you format the data for Caffe:

 

 

./examples/mnist/create_mnist.sh

Once that’s done, proceed with training the network:

 

 

./examples/mnist/train_lenet.sh

You should get an output similar to this:

 

 

I0831 18:43:19.290951 37 caffe.cpp:217] Using GPUs 0

I0831 18:43:19.291165 37 caffe.cpp:222] GPU 0: Device 67df

I0831 18:43:19.294853 37 solver.cpp:48] Initializing solver from parameters:

test_iter: 100

test_interval: 500

base_lr: 0.01

display: 100

max_iter: 10000

lr_policy: “inv”

gamma: 0.0001

power: 0.75

momentum: 0.9

weight_decay: 0.0005

snapshot: 5000

snapshot_prefix: “examples/mnist/lenet”

solver_mode: GPU

device_id: 0

net: “examples/mnist/lenet_train_test.prototxt”

train_state {

level: 0

stage: “”

}

I0831 18:43:19.294972 37 solver.cpp:91] Creating training net from net file: examples/mnist/lenet_train_test.prototxt

I0831 18:43:19.295145 37 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist

I0831 18:43:19.295169 37 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy

I0831 18:43:19.295181 37 net.cpp:58] Initializing net from parameters:

name: “LeNet”

state {

phase: TRAIN

level: 0

stage: “”

}

layer {

name: “mnist”

type: “Data”

top: “data”

top: “label”

include {

phase: TRAIN

}

transform_param {

scale: 0.00390625

}

data_param {

source: “examples/mnist/mnist_train_lmdb”

batch_size: 64

backend: LMDB

}

}

layer {

name: “conv1”

type: “Convolution”

bottom: “data”

top: “conv1”

param {

lr_mult: 1

}

param {

lr_mult: 2

}

convolution_param {

num_output: 20

kernel_size: 5

stride: 1

weight_filler {

type: “xavier”

}

bias_filler {

type: “constant”

}

}

}

….….layer {

name: “loss”

type: “SoftmaxWithLoss”

bottom: “ip2”

bottom: “label”

top: “loss”

}

I0831 18:43:19.295332 37 layer_factory.hpp:77] Creating layer mnist

I0831 18:43:19.295426 37 net.cpp:100] Creating Layer mnist

I0831 18:43:19.295444 37 net.cpp:408] mnist -> data

I0831 18:43:19.295478 37 net.cpp:408] mnist -> label

I0831 18:43:19.304414 40 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb

I0831 18:43:19.304760 37 data_layer.cpp:41] output data size: 64,1,28,28

I0831 18:43:19.305835 37 net.cpp:150] Setting up mnist

I0831 18:43:19.305842 37 net.cpp:157] Top shape: 64 1 28 28 (50176)

I0831 18:43:19.305848 37 net.cpp:157] Top shape: 64 (64)

I0831 18:43:19.305851 37 net.cpp:165] Memory required for data: 200960

I0831 18:43:19.305874 37 layer_factory.hpp:77] Creating layer conv1

I0831 18:43:19.305907 37 net.cpp:100] Creating Layer conv1

I0831 18:43:19.305912 37 net.cpp:434] conv1 <- data

I0831 18:43:19.305940 37 net.cpp:408] conv1 -> conv1

I0831 18:43:19.314159 37 cudnn_conv_layer.cpp:259] Before miopenConvolution*GetWorkSpaceSize

I0831 18:43:19.319051 37 cudnn_conv_layer.cpp:295] After miopenConvolution*GetWorkSpaceSize

I0831 18:43:19.319625 37 cudnn_conv_layer.cpp:468] Before miopenFindConvolutionForwardAlgorithm

I0831 18:43:19.927783 37 cudnn_conv_layer.cpp:493] fwd_algo_[0]: 1

I0831 18:43:19.927809 37 cudnn_conv_layer.cpp:494] workspace_fwd_sizes_[0]:57600

I0831 18:43:19.928071 37 cudnn_conv_layer.cpp:500] Before miopenFindConvolutionBackwardWeightsAlgorithm

….….I0831 18:43:23.296785 37 net.cpp:228] mnist does not need backward computation.

I0831 18:43:23.296789 37 net.cpp:270] This network produces output loss

I0831 18:43:23.296799 37 net.cpp:283] Network initialization done.

I0831 18:43:23.296967 37 solver.cpp:181] Creating test net (#0) specified by net file: examples/mnist/lenet_train_test.prototxt

I0831 18:43:23.296985 37 net.cpp:322] The NetState phase (1) differed from the phase (0) specified by a rule in layer mnist

I0831 18:43:23.296995 37 net.cpp:58] Initializing net from parameters:

name: “LeNet”

state {

phase: TEST

}

layer {

name: “mnist”

type: “Data”

top: “data”

top: “label”

include {

phase: TEST

}

transform_param {

scale: 0.00390625

}

data_param {

source: “examples/mnist/mnist_test_lmdb”

batch_size: 100

backend: LMDB

}

}……

I0831 18:44:12.620506 37 solver.cpp:404] Test net output #1: loss = 0.0299084 (* 1 = 0.0299084 loss)

I0831 18:44:12.624415 37 solver.cpp:228] Iteration 9000, loss = 0.011652

I0831 18:44:12.624441 37 solver.cpp:244] Train net output #0: loss = 0.011652 (* 1 = 0.011652 loss)

I0831 18:44:12.624449 37 sgd_solver.cpp:106] Iteration 9000, lr = 0.00617924

I0831 18:44:13.055759 37 solver.cpp:228] Iteration 9100, loss = 0.0061008

I0831 18:44:13.055778 37 solver.cpp:244] Train net output #0: loss = 0.0061008 (* 1 = 0.0061008 loss)

I0831 18:44:13.055800 37 sgd_solver.cpp:106] Iteration 9100, lr = 0.00615496

I0831 18:44:13.497696 37 solver.cpp:228] Iteration 9200, loss = 0.00277705

I0831 18:44:13.497715 37 solver.cpp:244] Train net output #0: loss = 0.00277706 (* 1 = 0.00277706 loss)

I0831 18:44:13.497720 37 sgd_solver.cpp:106] Iteration 9200, lr = 0.0061309

I0831 18:44:13.941920 37 solver.cpp:228] Iteration 9300, loss = 0.0111398

I0831 18:44:13.941941 37 solver.cpp:244] Train net output #0: loss = 0.0111398 (* 1 = 0.0111398 loss)

I0831 18:44:13.941946 37 sgd_solver.cpp:106] Iteration 9300, lr = 0.00610706

I0831 18:44:14.386647 37 solver.cpp:228] Iteration 9400, loss = 0.0179196

I0831 18:44:14.386667 37 solver.cpp:244] Train net output #0: loss = 0.0179195 (* 1 = 0.0179195 loss)

I0831 18:44:14.386672 37 sgd_solver.cpp:106] Iteration 9400, lr = 0.00608343

I0831 18:44:14.828459 37 solver.cpp:337] Iteration 9500, Testing net (#0)

I0831 18:44:14.983165 37 solver.cpp:404] Test net output #0: accuracy = 0.9884

I0831 18:44:14.983183 37 solver.cpp:404] Test net output #1: loss = 0.0393952 (* 1 = 0.0393952 loss)

I0831 18:44:14.987198 37 solver.cpp:228] Iteration 9500, loss = 0.00496538

I0831 18:44:14.987211 37 solver.cpp:244] Train net output #0: loss = 0.00496537 (* 1 = 0.00496537 loss)

I0831 18:44:14.987217 37 sgd_solver.cpp:106] Iteration 9500, lr = 0.00606002

I0831 18:44:15.433176 37 solver.cpp:228] Iteration 9600, loss = 0.00308157

I0831 18:44:15.433193 37 solver.cpp:244] Train net output #0: loss = 0.00308157 (* 1 = 0.00308157 loss)

I0831 18:44:15.433200 37 sgd_solver.cpp:106] Iteration 9600, lr = 0.00603682

I0831 18:44:15.878787 37 solver.cpp:228] Iteration 9700, loss = 0.00220143

I0831 18:44:15.878806 37 solver.cpp:244] Train net output #0: loss = 0.00220143 (* 1 = 0.00220143 loss)

I0831 18:44:15.878813 37 sgd_solver.cpp:106] Iteration 9700, lr = 0.00601382

I0831 18:44:16.321408 37 solver.cpp:228] Iteration 9800, loss = 0.0108761

I0831 18:44:16.321426 37 solver.cpp:244] Train net output #0: loss = 0.0108761 (* 1 = 0.0108761 loss)

I0831 18:44:16.321432 37 sgd_solver.cpp:106] Iteration 9800, lr = 0.00599102

I0831 18:44:16.765200 37 solver.cpp:228] Iteration 9900, loss = 0.00478531

I0831 18:44:16.765219 37 solver.cpp:244] Train net output #0: loss = 0.00478531 (* 1 = 0.00478531 loss)

I0831 18:44:16.765226 37 sgd_solver.cpp:106] Iteration 9900, lr = 0.00596843

I0831 18:44:17.204908 37 solver.cpp:454] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel

I0831 18:44:17.208767 37 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstate

I0831 18:44:17.211735 37 solver.cpp:317] Iteration 10000, loss = 0.0044067

I0831 18:44:17.211750 37 solver.cpp:337] Iteration 10000, Testing net (#0)

I0831 18:44:17.364528 37 solver.cpp:404] Test net output #0: accuracy = 0.9902

I0831 18:44:17.364547 37 solver.cpp:404] Test net output #1: loss = 0.0303562 (* 1 = 0.0303562 loss)

I0831 18:44:17.364552 37 solver.cpp:322] Optimization Done.

I0831 18:44:17.364555 37 caffe.cpp:254] Optimization Done.

 

 

Conclusion

In this article, we provided with you a guide on how to use AMD’s ROCm framework with Docker container technology. This should serve as a good jumpstart to begin your Deep Learning development using AMDs platform.

 

Docker has become an essential technology in containing the complexity of Deep Learning development. Deep Learning frameworks and tools have many dependencies. By leveraging Docker to isolate these dependencies within a Linux container leads to not only greater reliability and robustness but also to greater agility and flexibility. There are many frameworks and tools that are emerging and it is best practice to have a robust solution to the management of disparate parts. Docker containers have become a standard practice in Deep Learning and this technology is well supported by AMD’s ROCm framework.