MacBook Pro の Radeon GPU で Keras を使って MNIST を高速に学習させる方法

2019年12月現在、MacBook Pro の Radeon GPU で Keras を使って MNIST を高速に学習させることができました。忘れないよう環境構築手順等を記載します。Plaid-ML という機械学習ライブラリをインストールし Keras のバックエンドにそれを指定するとできることを確認しました。

1. 参考文献
2. 環境構築
3. Keras で GPU を使用するために追加するコード
4. MNIST の学習
5. まとめ

1 参考文献

以下の記事を参考にしています。

Tensorflowのdockerを使ってみる（macOS）

qiita.com

PlaidML Kerasでやっていく #TokyoR 73

PlaidML Kerasでやっていく #TokyoR 73 from Akifumi Eguchi

www.slideshare.net

Plaid ML Installation Instructions macOS

plaidml.github.io

Plaid ML の詳細はこちら

github.com

2 環境構築

環境構築手順について記載します。 Plaid ML Installation Instructions macOS に記載の手順の通りです。

2.1 前提条件

実際に試した環境は Anaconda の Python3.7 が入っているバージョンをインストールした Macbook Pro です。以下の手順ではコマンドが ~/anaconda3/bin/pip のように長くなっていますが、これは私のPCで Anaconda へのパスを設定していない状態のためです。

2.2 "plaidml-keras" のインストール

以下のコマンドを実行します。

~/anaconda3/bin/pip install -U plaidml-keras

Collecting plaidml-keras
  Downloading https://files.pythonhosted.org/packages/49/a4/cb4c18eb0d5ec72e69b0cda53a8dcf9310d02060d2398672d0ac52deb394/plaidml_keras-0.6.4-py2.py3-none-any.whl
Collecting plaidml (from plaidml-keras)
  Downloading https://files.pythonhosted.org/packages/b6/84/2ba8babe2f0a3eb5ec95f5ac569fa9b8299e76b4dcccf197bf4336032142/plaidml-0.6.4-py2.py3-none-macosx_10_10_x86_64.whl (31.0MB)
     |████████████████████████████████| 31.0MB 3.4MB/s 
Collecting keras==2.2.4 (from plaidml-keras)
  Downloading https://files.pythonhosted.org/packages/5e/10/aa32dad071ce52b5502266b5c659451cfd6ffcbf14e6c8c4f16c0ff5aaab/Keras-2.2.4-py2.py3-none-any.whl (312kB)
     |████████████████████████████████| 317kB 5.4MB/s 
Requirement already satisfied, skipping upgrade: six in ./anaconda3/lib/python3.7/site-packages (from plaidml-keras) (1.12.0)
Requirement already satisfied, skipping upgrade: numpy in ./anaconda3/lib/python3.7/site-packages (from plaidml->plaidml-keras) (1.17.2)
Requirement already satisfied, skipping upgrade: cffi in ./anaconda3/lib/python3.7/site-packages (from plaidml->plaidml-keras) (1.12.3)
Collecting enum34>=1.1.6 (from plaidml->plaidml-keras)
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Requirement already satisfied, skipping upgrade: h5py in ./anaconda3/lib/python3.7/site-packages (from keras==2.2.4->plaidml-keras) (2.9.0)
Collecting keras-preprocessing>=1.0.5 (from keras==2.2.4->plaidml-keras)
  Downloading https://files.pythonhosted.org/packages/28/6a/8c1f62c37212d9fc441a7e26736df51ce6f0e38455816445471f10da4f0a/Keras_Preprocessing-1.1.0-py2.py3-none-any.whl (41kB)
     |████████████████████████████████| 51kB 7.8MB/s 
Collecting keras-applications>=1.0.6 (from keras==2.2.4->plaidml-keras)
  Downloading https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9102edf6342d71b28fbfd9dea3d2f96a882ce099b03f/Keras_Applications-1.0.8-py3-none-any.whl (50kB)
     |████████████████████████████████| 51kB 9.4MB/s 
Requirement already satisfied, skipping upgrade: pyyaml in ./anaconda3/lib/python3.7/site-packages (from keras==2.2.4->plaidml-keras) (5.1.2)
Requirement already satisfied, skipping upgrade: scipy>=0.14 in ./anaconda3/lib/python3.7/site-packages (from keras==2.2.4->plaidml-keras) (1.3.1)
Requirement already satisfied, skipping upgrade: pycparser in ./anaconda3/lib/python3.7/site-packages (from cffi->plaidml->plaidml-keras) (2.19)
Installing collected packages: enum34, plaidml, keras-preprocessing, keras-applications, keras, plaidml-keras
Successfully installed enum34-1.1.6 keras-2.2.4 keras-applications-1.0.8 keras-preprocessing-1.1.0 plaidml-0.6.4 plaidml-keras-0.6.4

2.3 初期設定コマンド

以下のコマンドを実行します。

~/anaconda3/bin/plaidml-setup

PlaidML Setup (0.6.4)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the Apache License 2.0
 

Default Config Devices:
   metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
   metal_amd_radeon_pro_5500m.0 : AMD Radeon Pro 5500M (Metal)

Experimental Config Devices:
   llvm_cpu.0 : CPU (LLVM)
   opencl_amd_radeon_pro_5500m_compute_engine.0 : AMD AMD Radeon Pro 5500M Compute Engine (OpenCL)
   opencl_intel_uhd_graphics_630.0 : Intel Inc. Intel(R) UHD Graphics 630 (OpenCL)
   opencl_cpu.0 : Intel CPU (OpenCL)
   metal_intel(r)_uhd_graphics_630.0 : Intel(R) UHD Graphics 630 (Metal)
   metal_amd_radeon_pro_5500m.0 : AMD Radeon Pro 5500M (Metal)

Using experimental devices can cause poor performance, crashes, and other nastiness.

以下の画面で y を選択します。

Enable experimental device support? (y,n)[n]:y

以下の画面で 6 を選択します。

Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : llvm_cpu.0
   2 : opencl_amd_radeon_pro_5500m_compute_engine.0
   3 : opencl_intel_uhd_graphics_630.0
   4 : opencl_cpu.0
   5 : metal_intel(r)_uhd_graphics_630.0
   6 : metal_amd_radeon_pro_5500m.0

Default device? (1,2,3,4,5,6)[1]:6

以下の画面で y を選択します。

Selected device:
    metal_amd_radeon_pro_5500m.0

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

Save settings to /Users/kei/.plaidml? (y,n)[y]:y

Success! が表示されることを確認します。

Success!

3 Keras で GPU を使用するために追加するコード

GPU を使用する場合は以下のように、コードを先頭に追加します。

import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
import keras

4 MNIST の学習

動作確認として Keras 公式サイトの MNIST のニューラルネットを学習させます。

https://keras.io/examples/mnist_cnn/

上記サイトに記載のコードの一番上に、 Keras で GPU を使用するために追加するコードを追加しただけです。ソースコード全体は以下になります。

import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
        x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
        x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
        input_shape = (1, img_rows, img_cols)
else:
        x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
        x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
        input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                                 activation='relu',
                                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
                          optimizer=keras.optimizers.Adadelta(),
                          metrics=['accuracy'])

model.fit(x_train, y_train,
                  batch_size=batch_size,
                  epochs=epochs,
                  verbose=1,
                  validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

出力は以下になります。処理速度は 15秒/Epoch くらいです。

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 5s 0us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
INFO:plaidml:Opening device "metal_amd_radeon_pro_5500m.0"
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 17s 282us/step - loss: 0.2692 - acc: 0.9164 - val_loss: 0.0545 - val_acc: 0.9825
Epoch 2/12
60000/60000 [==============================] - 13s 219us/step - loss: 0.0891 - acc: 0.9738 - val_loss: 0.0398 - val_acc: 0.9864
Epoch 3/12
60000/60000 [==============================] - 13s 219us/step - loss: 0.0648 - acc: 0.9807 - val_loss: 0.0351 - val_acc: 0.9886
Epoch 4/12
60000/60000 [==============================] - 13s 219us/step - loss: 0.0549 - acc: 0.9836 - val_loss: 0.0340 - val_acc: 0.9878
Epoch 5/12
60000/60000 [==============================] - 13s 217us/step - loss: 0.0469 - acc: 0.9859 - val_loss: 0.0317 - val_acc: 0.9891
Epoch 6/12
60000/60000 [==============================] - 13s 219us/step - loss: 0.0405 - acc: 0.9875 - val_loss: 0.0298 - val_acc: 0.9904
Epoch 7/12
60000/60000 [==============================] - 13s 219us/step - loss: 0.0364 - acc: 0.9887 - val_loss: 0.0276 - val_acc: 0.9900
Epoch 8/12
60000/60000 [==============================] - 13s 216us/step - loss: 0.0338 - acc: 0.9899 - val_loss: 0.0314 - val_acc: 0.9898
Epoch 9/12
60000/60000 [==============================] - 13s 216us/step - loss: 0.0299 - acc: 0.9908 - val_loss: 0.0272 - val_acc: 0.9913
Epoch 10/12
60000/60000 [==============================] - 13s 215us/step - loss: 0.0276 - acc: 0.9912 - val_loss: 0.0263 - val_acc: 0.9914
Epoch 11/12
60000/60000 [==============================] - 13s 218us/step - loss: 0.0266 - acc: 0.9917 - val_loss: 0.0268 - val_acc: 0.9910
Epoch 12/12
60000/60000 [==============================] - 13s 219us/step - loss: 0.0256 - acc: 0.9920 - val_loss: 0.0272 - val_acc: 0.9922
Test loss: 0.02717138020992279
Test accuracy: 0.9922