Basic usage for CMake users

First of all, make sure the environment has been set up correctly already (refer to Environment requirement).

Clear Workspace

Before you do anything, clear the workspace used by build and test process.

tools/ [--expunge]

Build Engine

Please make sure you have CMake installed.

RUNTIME=GPU QUANTIZE=OFF bash tools/cmake/

which generate libraries in build/cmake-build/armeabi-v7a, you can use either static libraries or the shared library.

You can also build for other target abis: arm64-v8a, arm-linux-gnueabihf, aarch64-linux-gnu, host; and runtime: GPU, HEXAGON, HTA, APU.

Model Conversion

When you have prepared your model, the first thing to do is write a model config in YAML format.

    platform: tensorflow
    model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
      - input_tensors:
          - input
          - 1,224,224,3
          - MobilenetV1/Predictions/Reshape_1
          - 1,1001
    runtime: gpu

The following steps generate output to build directory which is the default build and test workspace. Suppose you have the model config in ../mace-models/mobilenet-v1/mobilenet-v1.yml. Then run

python tools/python/ --config ../mace-models/mobilenet-v1/mobilenet-v1.yml

which generate 4 files in build/mobilenet_v1/model/

├── mobilenet_v1.pb                (model file)
├──              (param file)
├── mobilenet_v1_index.html        (visualization page, you can open it in browser)
└── mobilenet_v1.pb_txt            (model text file, which can be for debug use)

MACE also supports other platform: caffe, onnx. Beyond GPU, users can specify cpu, dsp to run on other target devices.

Model Test and Benchmark

We provide simple tools to test and benchmark your model.

After model is converted, simply run

python tools/python/ --config ../mace-models/mobilenet-v1/mobilenet-v1.yml --validate

Or benchmark the model

python tools/python/ --config ../mace-models/mobilenet-v1/mobilenet-v1.yml --benchmark

It will test your model on the device configured in the model config (runtime). You can also test on other device by specify --runtime=cpu (dsp/hta/apu) when you run test if you previously build engine for the device. The log will be shown if --vlog_level=2 is specified.

Deploy your model into applications

Please refer to mace/tools/mace_run.ccfor full usage. The following list the key steps.

// Include the headers
#include "mace/public/mace.h"

// 0. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;

// 1. configuration
MaceStatus status;
MaceEngineConfig config;
std::shared_ptr<OpenclContext> opencl_context;
// Set the path to store compiled OpenCL kernel binaries.
// please make sure your application have read/write rights of the directory.
// this is used to reduce the initialization time since the compiling is too slow.
// It's suggested to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel recompilations.
const std::string storage_path ="path/to/storage";
opencl_context = GPUContextBuilder()

// 2. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};

// 3. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;

// Create Engine from model file
create_engine_status =
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
  // fall back to other strategy.

// 4. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
  // Allocate input and output
  int64_t input_size =
      std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
  auto buffer_in = std::shared_ptr<float>(new float[input_size],
  // Load input here
  // ...

  inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);

for (size_t i = 0; i < output_count; ++i) {
  int64_t output_size =
      std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
  auto buffer_out = std::shared_ptr<float>(new float[output_size],
  outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);

// 5. Run the model
MaceStatus status = engine.Run(inputs, &outputs);

More details are in Advanced usage for CMake users.