More than 3 years have passed since last update.

Mac mini で TensorFlow v2.3.0 と PlaidML を比較計測してみました(実行ログ)

Last updated at Posted at 2020-07-29



  • mnist_mlp.py (customized)
framework CPU load elapsed time
TensorFlow v2.3.0 89 % 16.170 sec
PlaidML + Keras 42 % 23.334 sec
  • mnist_cnn.py (customized)
framework CPU load elapsed time
TensorFlow v2.3.0 92 % 188.279 sec
PlaidML + Keras 37 % 316.005 sec

measure : MLP

  • using keras/examples/mnist_mlp.py (customized)

TensorFlow v2.3.0 (CPU)

  • time : 16.170s
(tf2) $ time python3 mnist_mlp.py
60000 train samples
10000 valid samples
2020-07-29 13:26:05.609231: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f9143ee1640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-29 13:26:05.609262: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Model: "sequential"
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 512)               401920    
dropout (Dropout)            (None, 512)               0         
dense_1 (Dense)              (None, 512)               262656    
dropout_1 (Dropout)          (None, 512)               0         
dense_2 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
469/469 [==============================] - 11s 23ms/step - loss: 0.2473 - accuracy: 0.9233 - val_loss: 0.1034 - val_accuracy: 0.9680
Valid loss: 0.10344783961772919
Valid acc.: 0.9679999947547913

real	0m16.170s
user	0m31.537s
sys 	0m4.165s
  • iostat : CPU load : 89 %
$ iostat 5
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
    4.00    0  0.00  19  8 73  4.74 3.39 2.58
   24.89    9  0.22  56 12 33  4.68 3.40 2.59
    0.00    0  0.00  75 14 11  4.79 3.44 2.61
    5.33    1  0.00  61 11 28  4.89 3.48 2.63
   21.97   47  1.00  13  8 79  4.73 3.47 2.63

PlaidML v0.6.4 (GPU) and Keras v2.2.4


    • opencl_amd_ati_radeon_hd_6630m.0
  • time : 23.334s

(tf2) $ time python3 mnist_mlp.py
60000 train samples
10000 valid samples
INFO:plaidml:Opening device "opencl_amd_ati_radeon_hd_6630m.0"
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 512)               401920    
dropout_1 (Dropout)          (None, 512)               0         
dense_2 (Dense)              (None, 512)               262656    
dropout_2 (Dropout)          (None, 512)               0         
dense_3 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 18s 306us/step - loss: 0.2518 - acc: 0.9220 - val_loss: 0.0986 - val_acc: 0.9714
Valid loss: 0.09862979149818421
Valid acc.: 0.9714

real	0m23.334s
user	0m17.709s
sys 	0m6.655s
  • iostat : CPU load : 42 %
$ iostat 5
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
   45.59    8  0.37   5  6 90  1.70 2.14 2.31
   25.30    9  0.21  18 10 72  1.89 2.17 2.32
   29.01   16  0.45  29 10 62  1.97 2.19 2.32
    4.00    0  0.00  27 12 61  1.98 2.18 2.32
    0.00    0  0.00  27 12 61  2.06 2.19 2.33
    4.00    0  0.00  29 12 59  1.97 2.17 2.32
   14.74   48  0.69  29 13 58  1.97 2.17 2.32
    0.00    0  0.00  13  5 82  2.06 2.19 2.32
    4.00    0  0.00  12  5 83  1.97 2.17 2.31

measure : CNN

  • using keras/examples/mnist_cnn.py (customized)

TensorFlow v2.3.0 (CPU)

  • time : 188.279s
(tf2) $ time python3 mnist_cnn.py
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 valid samples
2020-07-29 16:23:59.387600: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fce05190fc0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-29 16:23:59.387652: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Model: "sequential"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     
average_pooling2d (AveragePo (None, 12, 12, 64)        0         
dropout (Dropout)            (None, 12, 12, 64)        0         
flatten (Flatten)            (None, 9216)              0         
dense (Dense)                (None, 128)               1179776   
dropout_1 (Dropout)          (None, 128)               0         
dense_1 (Dense)              (None, 10)                1290      
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
469/469 [==============================] - 153s 327ms/step - loss: 2.2950 - accuracy: 0.1257 - val_loss: 2.2737 - val_accuracy: 0.2647
Valid loss: 2.273723840713501
Valid acc.: 0.2646999955177307

real	3m8.279s
user	8m30.347s
sys 	0m29.370s
  • iostat : CPU load : 92 %
$ iostat 5
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
   56.38  179  9.87   6  6 87  2.18 2.48 2.45
   15.38  217  3.26   7  6 87  2.32 2.50 2.46
   30.12  211  6.22   9  8 83  2.30 2.49 2.46
   64.89   56  3.53  42  8 50  2.83 2.60 2.50
    6.00    0  0.00  83  8  8  3.09 2.66 2.52
   12.80    5  0.06  83  8  8  3.40 2.73 2.54
    8.00    1  0.00  83  8  8  3.53 2.77 2.56
   21.78   13  0.27  84  8  8  3.73 2.82 2.58
    0.00    0  0.00  83  8  8  3.83 2.86 2.59
    4.00    0  0.00  84  8  8  3.92 2.89 2.60
    0.00    0  0.00  84  8  8  4.01 2.93 2.62
   13.80    6  0.08  84  8  8  4.25 2.99 2.64
   20.96   14  0.29  84  8  8  4.31 3.03 2.66
   18.07   23  0.41  84  8  8  4.52 3.09 2.68
    0.00    0  0.00  84  8  8  6.00 3.42 2.80
   38.34    8  0.31  84  8  8  5.84 3.43 2.81
    0.00    0  0.00  84  8  8  6.01 3.51 2.84
    0.00    0  0.00  84  8  7  6.17 3.58 2.87
   16.00    0  0.00  84  8  7  6.40 3.67 2.90
   18.50   17  0.30  84  8  8  6.45 3.73 2.93
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
    5.33    1  0.00  84  8  8  6.33 3.75 2.94
    4.00    0  0.00  84  8  8  6.14 3.75 2.95
    6.40    2  0.01  84  8  8  6.21 3.81 2.97
    0.00    0  0.00  84  8  8  6.11 3.83 2.98
    0.00    0  0.00  84  8  8  6.26 3.89 3.01
   21.03   14  0.30  84  8  8  6.32 3.95 3.03
    0.00    0  0.00  84  8  8  6.38 4.00 3.06
   35.60    8  0.28  84  8  8  6.19 4.00 3.06
    0.00    0  0.00  84  8  8  6.17 4.03 3.08
   80.00    0  0.02  84  9  8  6.40 4.11 3.11
    0.00    0  0.00  84  8  8  6.04 4.08 3.11
   18.29   14  0.25  84  8  8  6.12 4.12 3.13
    8.33    2  0.02  81  8 12  5.87 4.11 3.13
    4.00    1  0.00  85  7  8  5.80 4.12 3.14
   17.28   31  0.53  81  8 11  5.90 4.17 3.16
   37.58   40  1.47  71  8 21  5.66 4.15 3.16
    9.14    1  0.01   4  6 90  5.21 4.08 3.14
   20.53   17  0.34   3  6 91  4.87 4.03 3.13

PlaidML v0.6.4 (GPU) and Keras v2.2.4

  • time : 316.005s
(tf2) $ time python3 mnist_cnn.py
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 valid samples
INFO:plaidml:Opening device "opencl_amd_ati_radeon_hd_6630m.0"
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496     
average_pooling2d_1 (Average (None, 12, 12, 64)        0         
dropout_1 (Dropout)          (None, 12, 12, 64)        0         
flatten_1 (Flatten)          (None, 9216)              0         
dense_1 (Dense)              (None, 128)               1179776   
dropout_2 (Dropout)          (None, 128)               0         
dense_2 (Dense)              (None, 10)                1290      
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 298s 5ms/step - loss: 0.3072 - acc: 0.9063 - val_loss: 0.0794 - val_acc: 0.9753
Valid loss: 0.07935381038188934
Valid acc.: 0.9753

real	5m16.005s
user	4m50.810s
sys 	0m11.139s
  • iostat : CPU load : 37 %
$ iostat 5
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
   15.44   27  0.41   6  7 87  2.13 2.60 2.40
    4.00    0  0.00   4  6 90  2.12 2.59 2.39
    4.00    0  0.00  20  7 73  2.27 2.61 2.40
    0.00    0  0.00  27  4 69  2.33 2.62 2.41
    4.00    0  0.00  27  4 69  2.62 2.67 2.43
    0.00    0  0.00  28  4 68  2.65 2.68 2.43
   20.46   25  0.50  26  4 70  2.60 2.67 2.43
    0.00    0  0.00  27  4 69  2.79 2.71 2.44
   23.30   11  0.26  27  4 69  3.05 2.76 2.46
   32.07   12  0.37  28  4 68  2.96 2.75 2.46
   36.24   10  0.36  29  6 65  3.13 2.79 2.47
    6.00    1  0.00  27  4 69  3.04 2.77 2.47
   16.55   36  0.58  27  4 68  3.03 2.78 2.47
    4.00    0  0.00  28  5 68  2.95 2.76 2.47
    4.00    0  0.00  28  5 67  2.95 2.77 2.47
   31.86   17  0.53  27  4 69  2.88 2.75 2.47
   14.05    8  0.12  29  6 64  2.89 2.76 2.47
   20.72    8  0.16  27  4 69  2.90 2.76 2.48
   22.78   29  0.65  28  4 68  2.82 2.75 2.47
   26.55    2  0.06  27  3 69  2.76 2.74 2.47
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
   27.79   12  0.31  27  4 70  2.70 2.72 2.47
   22.35    3  0.07  28  4 69  2.64 2.71 2.46
   36.31   10  0.36  30  7 62  2.67 2.72 2.47
   34.19   13  0.43  30  4 66  2.86 2.75 2.48
   21.58   31  0.66  31  6 64  2.87 2.76 2.49
   18.50    2  0.03  28  5 67  2.96 2.78 2.49
    4.00    0  0.00  26  3 70  2.88 2.76 2.49
   21.76    7  0.14  27  4 69  2.89 2.77 2.49
   27.33    1  0.03  27  4 69  2.82 2.75 2.49
   35.08    5  0.18  28  4 68  2.75 2.74 2.49
   16.68   28  0.45  28  5 67  2.77 2.75 2.49
    4.00    0  0.00  27  4 69  2.87 2.77 2.50
    9.33    1  0.01  27  3 70  2.80 2.75 2.50
    0.00    0  0.00  26  4 70  2.74 2.74 2.49
   41.42   12  0.48  28  4 68  2.84 2.76 2.50
   35.20    1  0.03  28  5 68  3.09 2.81 2.52
   19.91   12  0.24  27  4 69  3.16 2.83 2.53
    0.00    0  0.00  26  3 71  3.07 2.82 2.53
    9.27   23  0.20  27  4 70  2.98 2.81 2.52
    0.00    0  0.00  26  3 71  2.98 2.81 2.53
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
    4.00    0  0.00  27  4 70  2.99 2.81 2.53
    0.00    0  0.00  26  3 71  2.99 2.82 2.53
   28.24    8  0.23  26  4 70  2.99 2.82 2.53
    0.00    0  0.00  26  3 70  2.91 2.80 2.53
    4.00    0  0.00  27  4 69  2.84 2.79 2.53
   34.93    8  0.28  27  4 69  2.77 2.78 2.52
    0.00    0  0.00  26  3 71  2.63 2.75 2.51
    0.00    0  0.00  26  3 71  2.58 2.74 2.51
   22.71   10  0.23  26  4 70  2.53 2.72 2.51
    0.00    0  0.00  26  3 71  2.49 2.71 2.50
    4.00    1  0.00  26  4 70  2.45 2.70 2.50
    4.00    1  0.00  27  4 70  2.41 2.69 2.50
    4.00    1  0.00  26  4 70  2.46 2.69 2.50
   27.88   21  0.57  26  4 70  2.50 2.70 2.50
   28.75    8  0.22  26  4 70  2.54 2.70 2.51
    4.00    0  0.00  26  4 70  2.58 2.71 2.51
    4.00    1  0.00  26  3 71  2.61 2.71 2.51
    4.00    1  0.00  26  3 71  2.56 2.70 2.51
    4.00    0  0.00  26  3 71  2.60 2.70 2.51
    4.00    0  0.00  28  4 68  2.55 2.69 2.51
              disk0       cpu    load average
    KB/t  tps  MB/s  us sy id   1m   5m   15m
   37.43    1  0.05  26  5 68  2.50 2.68 2.50
    4.00    1  0.00  26  4 70  2.78 2.73 2.52
    4.00    1  0.00  26  5 68  2.72 2.72 2.52
   33.07    9  0.29  24  7 69  2.90 2.76 2.53
   14.00    2  0.02  26  6 68  2.99 2.78 2.54
    7.47    3  0.02  16  5 79  2.83 2.75 2.53
   17.47   17  0.29   2  6 92  2.68 2.72 2.52
    0.00    0  0.00   2  6 92  2.47 2.68 2.51

setup log

(tf2) $ plaidml-setup 

PlaidML Setup (0.6.4)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the Apache License 2.0

Default Config Devices:
   No devices.

Experimental Config Devices:
   llvm_cpu.0 : CPU (LLVM)
   opencl_amd_ati_radeon_hd_6630m.0 : AMD ATI Radeon HD 6630M (OpenCL)
   opencl_cpu.0 : Intel CPU (OpenCL)

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:y

Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : llvm_cpu.0
   2 : opencl_amd_ati_radeon_hd_6630m.0
   3 : opencl_cpu.0

Default device? (1,2,3)[1]:2

Selected device:

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

Save settings to /Users/nobi/.plaidml? (y,n)[y]:

error log

AttributeError: module 'tensorflow' has no attribute 'get_default_graph'



