How to build¶
Supported Platforms¶
Platform | Explanation |
---|---|
TensorFlow | >= 1.6.0. |
Caffe | >= 1.0. |
Environment Requirement¶
MACE requires the following dependencies:
software | version | install command |
---|---|---|
bazel | >= 0.13.0 | bazel installation guide |
android-ndk | r15c/r16b | NDK installation guide or refers to the docker file |
adb | >= 1.0.32 | apt-get install android-tools-adb |
tensorflow | >= 1.6.0 | pip install -I tensorflow==1.6.0 (if you use tensorflow model) |
numpy | >= 1.14.0 | pip install -I numpy==1.14.0 |
scipy | >= 1.0.0 | pip install -I scipy==1.0.0 |
jinja2 | >= 2.10 | pip install -I jinja2==2.10 |
PyYaml | >= 3.12.0 | pip install -I pyyaml==3.12 |
sh | >= 1.12.14 | pip install -I sh==1.12.14 |
filelock | >= 3.0.0 | pip install -I filelock==3.0.0 |
docker (for caffe) | >= 17.09.0-ce | docker installation guide |
Note
export ANDROID_NDK_HOME=/path/to/ndk
to specify ANDROID_NDK_HOME
MACE provides Dockerfile with these dependencies installed, you can build the image from the Dockerfile,
cd docker
docker build -t xiaomimace/mace-dev
or pull the pre-built image from Docker Hub,
docker pull xiaomimace/mace-dev
and then run the container with the following command.
# Create container
# Set 'host' network to use ADB
docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb --net=host \
-v /local/path:/container/path xiaomimace/mace-dev /bin/bash
Usage¶
1. Pull MACE source code¶
git clone https://github.com/XiaoMi/mace.git
git fetch --all --tags --prune
# Checkout the latest tag (i.e. release version)
tag_name=`git describe --abbrev=0 --tags`
git checkout tags/${tag_name}
Note
It's highly recommanded to use a release version instead of master branch.
2. Model Preprocessing¶
- TensorFlow
TensorFlow provides Graph Transform Tool to improve inference efficiency by making various optimizations like Ops folding, redundant node removal etc. It's strongly recommended to make these optimizations before graph conversion step.
The following commands show the suggested graph transformations and optimizations for different runtimes,
# CPU/GPU:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
# DSP:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
backport_concatv2
quantize_weights(minimum_size=2)
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
- Caffe
MACE converter only supports Caffe 1.0+, you need to upgrade your models with Caffe built-in tool when necessary,
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
4. Deployment¶
build
command will generate the static/shared library, model files and
header files and packaged as
build/${library_name}/libmace_${library_name}.tar.gz
.
- The generated
static
libraries are organized as follows,
build/
└── mobilenet-v2-gpu
├── include
│ └── mace
│ └── public
│ ├── mace.h
│ └── mace_runtime.h
├── libmace_mobilenet-v2-gpu.tar.gz
├── lib
│ ├── arm64-v8a
│ │ └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
│ └── armeabi-v7a
│ └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
├── model
│ ├── mobilenet_v2.data
│ └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│ └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
- The generated
shared
libraries are organized as follows,
build
└── mobilenet-v2-gpu
├── include
│ └── mace
│ └── public
│ ├── mace.h
│ └── mace_runtime.h
├── lib
│ ├── arm64-v8a
│ │ ├── libgnustl_shared.so
│ │ └── libmace.so
│ └── armeabi-v7a
│ ├── libgnustl_shared.so
│ └── libmace.so
├── model
│ ├── mobilenet_v2.data
│ └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│ └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
Note
- DSP runtime depends on
libhexagon_controller.so
. ${MODEL_TAG}.pb
file will be generated only whenbuild_type
isproto
.${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin
will be generated only whentarget_socs
andgpu
runtime are specified.- Generated shared library depends on
libgnustl_shared.so
.
Warning
${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin
depends
on the OpenCL version of the device, you should maintan the compatibility or
configure compiling cache store with ConfigKVStorageFactory
.
5. How to use the library in your project¶
Please refer to mace/examples/example.cc
for full usage. The following list the key steps.
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the build_type is code
#include "mace/public/mace_engine_factory.h"
// 0. Set pre-compiled OpenCL binary program file paths when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 1. Set compiled OpenCL kernel cache, this is used to reduce the
// initialization time since the compiling is too slow. It's suggested
// to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel
// recompilations.
const std::string file_path ="path/to/opencl_cache_file";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 2. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
// Create Engine from model file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// 5. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// Load input here
// ...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);