How to build¶
Supported Platforms¶
| Platform | Explanation |
|---|---|
| Tensorflow | >= 1.6.0. (first choice, convenient for Android NN API in the future) |
| Caffe | >= 1.0. |
Environment Requirement¶
macesupply a docker image which contains all required environment. Dockerfile under the ./docker directory.
the followings are start commands:
sudo docker pull cr.d.xiaomi.net/mace/mace-dev
sudo docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb --net=host -v /local/path:/container/path cr.d.xiaomi.net/mace/mace-dev /bin/bash
if you want to run on your local computer, you have to install the following softwares.
| software | version | install command |
|---|---|---|
| bazel | >= 0.13.0 | bazel installation |
| android-ndk | r15c/r16b | reference the docker file |
| adb | >= 1.0.32 | apt-get install android-tools-adb |
| tensorflow | >= 1.6.0 | pip install -I tensorflow==1.6.0 (if you use tensorflow model) |
| numpy | >= 1.14.0 | pip install -I numpy=1.14.0 |
| scipy | >= 1.0.0 | pip install -I scipy=1.0.0 |
| jinja2 | >= 2.10 | pip install -I jinja2=2.10 |
| PyYaml | >= 3.12.0 | pip install -I pyyaml=3.12 |
| sh | >= 1.12.14 | pip install -I sh=1.12.14 |
| filelock | >= 3.0.0 | pip install -I filelock=3.0.0 |
| docker (for caffe) | >= 17.09.0-ce | install doc |
Docker Images¶
- Login in Xiaomi Docker Registry
docker login cr.d.xiaomi.net
- Build with Dockerfile
docker build -t cr.d.xiaomi.net/mace/mace-dev
- Pull image from docker registry
docker pull cr.d.xiaomi.net/mace/mace-dev
- Create container
# Set 'host' network to use ADB
docker run -it --rm -v /local/path:/container/path --net=host cr.d.xiaomi.net/mace/mace-dev /bin/bash
Usage¶
1. Pull code with latest tag¶
Warning
please do not use master branch for deployment.
git clone git@v9.git.n.xiaomi.com:deep-computing/mace.git
# update
git fetch --all --tags --prune
# get latest tag version
tag_name=`git describe --abbrev=0 --tags`
# checkout to latest tag branch
git checkout -b ${tag_name} tags/${tag_name}
2. Model Optimization¶
- Tensorflow
Tensorflow supply a model optimization tool for speed up inference. The docker image contain the tool, by the way you can download from transform_graph or compile from tensorflow source code.
The following commands are optimization for CPU, GPU and DSP.
# CPU/GPU:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
# DSP:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
backport_concatv2
quantize_weights(minimum_size=2)
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
- Caffe
Only support versions greater then 1.0, please use the tools caffe supplied to upgrade the models.
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
3. Build static library¶
3.1 Overview¶
Mace only build static library. the followings are two use cases.
build for specified SOC
You must assign
target_socsin yaml configuration file. if you want to use gpu for the soc, mace will tuning the parameters for better performance automatically.Warning
you should plug in a phone with that soc.
build for all SOC
When no
target_socspecified, the library is suitable for all soc.Warning
The performance will be a little poorer than the first case.
We supply a python script tools/converter.py to build the library and run the model on the command line.
Warning
must run the script on the root directory of the mace code.
3.2 tools/converter.py explanation¶
Commands
build
Note
build static library and test tools.
- --config (type=str, default="", required): the path of model yaml configuration file.
- --tuning (default=false, optional): whether tuning the parameters for the GPU of specified SOC.
- --enable_openmp (default=true, optional): whether use openmp.
run
Note
run the models in command line
- --config (type=str, default="", required): the path of model yaml configuration file.
- --round (type=int, default=1, optional): times for run.
- --validate (default=false, optional): whether to verify the results of mace are consistent with the frameworks。
- --caffe_env (type=local/docker, default=docker, optional): you can specific caffe environment for validation. local environment or caffe docker image.
- --restart_round (type=int, default=1, optional): restart round between run.
- --check_gpu_out_of_memory (default=false, optional): whether check out of memory for gpu.
- --vlog_level (type=int[0-5], default=0, optional): verbose log level for debug.
Warning
runrely onbuildcommand, you shouldrunafterbuild.
- benchmark
- --config (type=str, default="", required): the path of model yaml configuration file.
Warning
benchmarkrely onbuildcommand, you shouldbenchmarkafterbuild.common arguments
argument(key) argument(value) default required commands explanation --omp_num_threads int -1 N run/benchmarknumber of threads --cpu_affinity_policy int 1 N run/benchmark0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY --gpu_perf_hint int 3 N run/benchmark0:DEFAULT/1:LOW/2:NORMAL/3:HIGH --gpu_perf_hint int 3 N run/benchmark0:DEFAULT/1:LOW/2:NORMAL/3:HIGH --gpu_priority_hint int 3 N run/benchmark0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
3.3 tools/converter.pyusage examples¶
# print help message
python tools/converter.py -h
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
# Build the static library
python tools/converter.py build --config=models/config.yaml
# Test model run time
python tools/converter.py run --config=models/config.yaml --round=100
# Compare the results of mace and platform. use the **cosine distance** to represent similarity.
python tools/converter.py run --config=models/config.yaml --validate
# Benchmark Model: check the execution time of each Op.
python tools/converter.py benchmark --config=models/config.yaml
# Check the memory usage of the model(**Just keep only one model in configuration file**)
python tools/converter.py run --config=models/config.yaml --round=10000 &
adb shell dumpsys meminfo | grep mace_run
sleep 10
kill %1
4. Deployment¶
build command will generate a package which contains the static library, model files and header files.
the package is at ./build/${library_name}/libmace_${library_name}.tar.gz.
The followings list the details.
- header files
include/mace/public/*.h
- static libraries
library/${target_abi}/*.a
- dynamic libraries
library/libhexagon_controller.so
Note
only use for DSP
- model files
model/${MODEL_TAG}.pbmodel/${MODEL_TAG}.data
Note
.pbfile will be generated only when build_type isproto.- OpenCL compiled kernel binary file
opencl/compiled_kernel.bin
Note
This file will be generated only when specify
target_socand runtime isgpu.Warning
This file rely on the OpenCL driver on the phone, you should update the file when OpenCL driver changed.
5. how to use¶
Please refer to mace/examples/example.ccfor full usage. the following list the key steps.
// include the header files
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
#include "mace/public/mace_engine_factory.h"
// 0. set internal storage factory(**Call once**)
const std::string file_path ="/path/to/store/internel/files";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 1. set precompiled OpenCL binary file paths if you use gpu of specified SOC,
// Besides the binary rely on the OpenCL driver of the SOC,
// if OpenCL driver changed, you should recompiled the binary file.
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 2. Declare the device type(must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine object
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
// Create Engine from proto file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// do something
}
// 5. Create Input and Output objects
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// load input
...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);