Benchmark usage

This part contains the usage of MACE benchmark tools.

Overview

As mentioned in the previous part, there are two kinds of benchmark tools, one for operator and the other for model.

Operator Benchmark

Operator Benchmark is used for test and optimize the performance of specific operator.

Usage

For CMake users:

python tools/python/run_target.py \
    --target_abi=armeabi-v7a --target_socs=all --target_name=mace_cc_benchmark \
    --filter=.*BM_CONV.*

or for Bazel users:

python tools/bazel_adb_run.py --target="//test/ccbenchmark:mace_cc_benchmark" \
    --run_target=True  --args="--filter=.*BM_CONV.*"

Output

Benchmark                                                    Time(ns) Iterations Input(MB/s)   GMACPS
------------------------------------------------------------------------------------------------------
MACE_BM_CONV_2D_1_1024_7_7_K1x1S1D1_SAME_1024_float_CPU       1759129        479     114.09      29.21
MACE_BM_CONV_2D_1_1024_7_7_K1x1S1D1_SAME_1024_float_GPU       4031301        226      49.79      12.75
MACE_BM_CONV_2D_1_1024_7_7_K1x1S1D1_SAME_1024_half_GPU        3996357        266      25.11      12.86
MACE_BM_CONV_2D_1_1024_7_7_K1x1S1D1_SAME_1024_uint8_t_CPU      914994       1093      54.84      56.15

Explanation

Options Usage
Benchmark Benchmark unit name.
Time Time of one round.
Iterations the number of iterations to run, which is between 10 and 1000,000,000. the value is calculated based on the strategy total run time does not exceed 1s.
Input The bandwidth of dealing with input. the unit is MB/s.
GMACPS The speed of running MACs(multiply-accumulation). the unit is G/s.

Model Benchmark

Model Benchmark is used for test and optimize the performance of your model. This tool could record the running time of the model and the detailed running information of each operator of your model.

Usage

For CMake users:

python tools/python/run_model.py --config=/path/to/your/model_deployment.yml --benchmark

or for Bazel users:

python tools/converter.py run --config=/path/to/your/model_deployment.yml --benchmark

Output

I statistics.cc:343 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343                                                                                      Sort by Run Order
I statistics.cc:343 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343 |         Op Type |  Start | First | Avg(ms) |     % |    cdf% | GMACPS | Stride |   Pad |    Filter Shape |   Output Shape | Dilation |                                               name |
I statistics.cc:343 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343 |       Transpose |  0.000 | 0.102 |   0.100 | 0.315 |   0.315 |  0.000 |        |       |                 |  [1,3,224,224] |          |                                              input |
I statistics.cc:343 |          Conv2D |  0.107 | 1.541 |   1.570 | 4.943 |   5.258 |  6.904 |  [2,2] |  SAME |      [32,3,3,3] | [1,32,112,112] |    [1,1] |             MobilenetV1/MobilenetV1/Conv2d_0/Relu6 |
I statistics.cc:343 | DepthwiseConv2d |  1.724 | 0.936 |   0.944 | 2.972 |   8.230 |  3.827 |  [1,1] |  SAME |      [1,32,3,3] | [1,32,112,112] |    [1,1] |   MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu6 |
I statistics.cc:343 |         Softmax | 32.835 | 0.039 |   0.042 | 0.131 |  99.996 |  0.000 |        |       |                 |       [1,1001] |          |                    MobilenetV1/Predictions/Softmax |
I statistics.cc:343 |        Identity | 32.880 | 0.001 |   0.001 | 0.004 | 100.000 |  0.000 |        |       |                 |       [1,1001] |          | mace_output_node_MobilenetV1/Predictions/Reshape_1 |
I statistics.cc:343 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343
I statistics.cc:343 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343                                                                              Sort by Computation Time
I statistics.cc:343 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343 | Op Type |  Start | First | Avg(ms) |     % |   cdf% | GMACPS | Stride |  Pad |    Filter Shape |   Output Shape | Dilation |                                              name |
I statistics.cc:343 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343 |  Conv2D | 30.093 | 2.102 |   2.198 | 6.922 |  6.922 | 23.372 |  [1,1] | SAME | [1024,1024,1,1] |   [1,1024,7,7] |    [1,1] | MobilenetV1/MobilenetV1/Conv2d_13_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D |  7.823 | 2.115 |   2.164 | 6.813 | 13.735 | 23.747 |  [1,1] | SAME |   [128,128,1,1] |  [1,128,56,56] |    [1,1] |  MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D | 15.859 | 2.119 |   2.109 | 6.642 | 20.377 | 24.358 |  [1,1] | SAME |   [512,512,1,1] |  [1,512,14,14] |    [1,1] |  MobilenetV1/MobilenetV1/Conv2d_7_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D | 23.619 | 2.087 |   2.096 | 6.599 | 26.976 | 24.517 |  [1,1] | SAME |   [512,512,1,1] |  [1,512,14,14] |    [1,1] | MobilenetV1/MobilenetV1/Conv2d_10_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D | 26.204 | 2.081 |   2.093 | 6.590 | 33.567 | 24.549 |  [1,1] | SAME |   [512,512,1,1] |  [1,512,14,14] |    [1,1] | MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D | 21.038 | 2.036 |   2.091 | 6.585 | 40.152 | 24.569 |  [1,1] | SAME |   [512,512,1,1] |  [1,512,14,14] |    [1,1] |  MobilenetV1/MobilenetV1/Conv2d_9_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D | 18.465 | 2.034 |   2.082 | 6.554 | 46.706 | 24.684 |  [1,1] | SAME |   [512,512,1,1] |  [1,512,14,14] |    [1,1] |  MobilenetV1/MobilenetV1/Conv2d_8_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D |  2.709 | 1.984 |   2.058 | 6.482 | 53.188 | 12.480 |  [1,1] | SAME |     [64,32,1,1] | [1,64,112,112] |    [1,1] |  MobilenetV1/MobilenetV1/Conv2d_1_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D | 12.220 | 1.788 |   1.901 | 5.986 | 59.174 | 27.027 |  [1,1] | SAME |   [256,256,1,1] |  [1,256,28,28] |    [1,1] |  MobilenetV1/MobilenetV1/Conv2d_5_pointwise/Relu6 |
I statistics.cc:343 |  Conv2D |  0.107 | 1.541 |   1.570 | 4.943 | 64.117 |  6.904 |  [2,2] | SAME |      [32,3,3,3] | [1,32,112,112] |    [1,1] |            MobilenetV1/MobilenetV1/Conv2d_0/Relu6 |
I statistics.cc:343 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I statistics.cc:343
I statistics.cc:343 ----------------------------------------------------------------------------------------------
I statistics.cc:343                                        Stat by Op Type
I statistics.cc:343 ----------------------------------------------------------------------------------------------
I statistics.cc:343 |         Op Type | Count | Avg(ms) |      % |    cdf% |        MACs | GMACPS | Called times |
I statistics.cc:343 ----------------------------------------------------------------------------------------------
I statistics.cc:343 |          Conv2D |    15 |  24.978 | 78.693 |  78.693 | 551,355,392 | 22.074 |           15 |
I statistics.cc:343 | DepthwiseConv2d |    13 |   6.543 | 20.614 |  99.307 |  17,385,984 |  2.657 |           13 |
I statistics.cc:343 |       Transpose |     1 |   0.100 |  0.315 |  99.622 |           0 |  0.000 |            1 |
I statistics.cc:343 |         Pooling |     1 |   0.072 |  0.227 |  99.849 |           0 |  0.000 |            1 |
I statistics.cc:343 |         Softmax |     1 |   0.041 |  0.129 |  99.978 |           0 |  0.000 |            1 |
I statistics.cc:343 |         Squeeze |     1 |   0.006 |  0.019 |  99.997 |           0 |  0.000 |            1 |
I statistics.cc:343 |        Identity |     1 |   0.001 |  0.003 | 100.000 |           0 |  0.000 |            1 |
I statistics.cc:343 ----------------------------------------------------------------------------------------------
I statistics.cc:343
I statistics.cc:343 ---------------------------------------------------------
I statistics.cc:343           Stat by MACs(Multiply-Accumulation)
I statistics.cc:343 ---------------------------------------------------------
I statistics.cc:343 |       total | round | first(G/s) | avg(G/s) |     std |
I statistics.cc:343 ---------------------------------------------------------
I statistics.cc:343 | 568,741,376 |   100 |     18.330 |   17.909 | 301.326 |
I statistics.cc:343 ---------------------------------------------------------
I statistics.cc:343 ------------------------------------------------------------------------
I statistics.cc:343                           Summary of Ops' Stat
I statistics.cc:343 ------------------------------------------------------------------------
I statistics.cc:343 | round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |     std |
I statistics.cc:343 ------------------------------------------------------------------------
I statistics.cc:343 |   100 |    31.028 |   32.093 |  31.028 |  32.346 |  31.758 | 301.326 |
I statistics.cc:343 ------------------------------------------------------------------------

Explanation

There are 8 sections of the output information.

  1. Warm Up

This section lists the time information of warm-up run. The detailed explanation is list as below.

Key Explanation
round the number of round has been run.
first the run time of first round. unit is millisecond.
curr the run time of last round. unit is millisecond.
min the minimal run time of all rounds. unit is millisecond.
max the maximal run time of all rounds. unit is millisecond.
avg the average run time of all rounds. unit is millisecond.
std the standard deviation of all rounds.
  1. Run without statistics
This section lists the run time information without statistics code.
the detailed explanation is the same as the section of Warm Up.
  1. Run with statistics
This section lists the run time information with statistics code,
the time maybe longer compared with the second section. the detailed explanation is the same as the section of Warm Up.
  1. Sort by Run Order

This section lists the detailed run information of every operator in your model. The operators is listed based on the run order, Every line is an operator of your model. The detailed explanation is list as below.

Key Explanation
Op Type the type of operator.
Start the start time of the operator. unit is millisecond.
First the run time of first round. unit is millisecond.
Avg the average run time of all rounds. unit is millisecond.
% the percentage of total running time.
cdf% the cumulative percentage of running time.
GMACPS The number of run MACs(multiply-accumulation) per second. the unit is G/s.
Stride the stride parameter of the operator if exist.
Pad the pad parameter of the operator if exist.
Filter Shape the filter shape of the operator if exist.
Output Shape the output shape of the operator.
Dilation the dilation parameter of the operator if exist.
Name the name of the operator.
  1. Sort by Computation time

This section lists the top-10 most time-consuming operators. The operators is listed based on the computation time, the detailed explanation is the same as previous section.

  1. Stat by Op Type

This section stats the run information about operators based on operator type.

Op Type the type of operator.
Count the number of operators with the type.
Avg the average run time of the operator. unit is millisecond.
% the percentage of total running time.
cdf% the cumulative percentage of running time.
MACs The number of MACs(multiply-accumulation).
GMACPS The number of MACs(multiply-accumulation) runs per second. the unit is G/s.
Called times the number of called times in all rounds.
  1. Stat by MACs

This section stats the MACs information of your model.

total the number of MACs of your model.
round the number of round has been run.
First the GMAPS of first round. unit is G/s.
Avg the average GMAPS of all rounds. unit is G/s.
std the standard deviation of all rounds.
  1. Summary of Ops' Stat

This section lists the run time information which is summation of every operator's run time. which may be shorter than the model's run time with statistics. the detailed explanation is the same as the section of Warm Up.