English | 中文

Introduction

ZQCNN is an inference framework that can run on Windows, Linux and ARM-Linux. It also includes some face detection and recognition related demos.

Main Development Environment: VS2015 with Update 3

MKL Download: Download Here

Core Modules Support Linux:

If you cannot compile completely following build-with-cmake.md, you can only compile ZQ_GEMM, ZQCNN, and other programs you want to test

Core Modules Support ARM-Linux:

If you cannot compile completely following build-with-cmake.md, you can only compile ZQ_GEMM, ZQCNN, and other programs you want to test

BUG: cmake .. -DSIMD_ARCH_TYPE=arm64 -DBLAS_TYPE=openblas_zq_gemm

In ideal case, the faster one between OpenBLAS and ZQ_GEMM will be used for convolution computation (I choose the branch by testing time on Cortex-A72). However, this option cannot achieve the expected effect currently. You need to manually comment in ZQ_CNN_CompileConfig.h:

#define ZQ_CNN_USE_ZQ_GEMM 1
#define ZQ_CNN_USE_BLAS_GEMM 1

You can comment out:

line 67: #if defined(ZQ_CNN_USE_BOTH_BLAS_ZQ_GEMM)
line 70: #endif

Training Related

PyTorch Training SSD: https://github.com/zuoqing1988/pytorch-ssd-for-ZQCNN

Training Gender & Age: https://github.com/zuoqing1988/train-GenderAge

Training MTCNN: https://github.com/zuoqing1988/train-mtcnn

Training SSD: https://github.com/zuoqing1988/train-ssd

Training MTCNN for Head Detection: https://github.com/zuoqing1988/train-mtcnn-head

Update Log

Update on Aug 18, 2022: Optimized 106-point facial landmark pipeline in video mode, added a new head pose and gaze estimation model

Demo program is in SampleVideoFaceDetection_Interface.cpp

The original 106-point pb model and head pose gaze pb model are in TensorFlow_to_ZQCNN

Update on Apr 20, 2022: Support SSD models trained with pytorch-ssd-for-ZQCNN

Update on May 8, 2020: Added text recognition sample SampleOCR

No text detection capability yet, input images need to be pre-cropped

Model download address

Link: https://pan.baidu.com/s/1O75LRBjXWwPXqAshLMJV3w Extraction code: f2q8

Update on Mar 22, 2020: Provide MTCNN models that can detect faces with masks

model\det1-dw20-plus.zqparams
model\det1-dw20-plus.nchwbin

model\det2-dw24-p0.zqparams	
model\det2-dw24-p0.nchwbin

model\det3-dw48-p0.zqparams
model\det3-dw48-p0.nchwbin

Update on Jul 8, 2019: Code for converting ZQCNN models to MNN models

Click to read

Update on May 28, 2019: Open source a quasi-commercial 106-point model

ZQCNN format: in model folder det5-dw112

mxnet format: Link: https://pan.baidu.com/s/19DTG3rmkct8AiEu0l3DYjw Code: qjzk

Update on Mar 16, 2019: Reached 800 stars, release more accurate 106-point landmark model

ZQCNN format:det5-dw96-v2s in model folder det5-dw96-v2s.zqparams, det5-dw96-v2s.nchwbin

mxnet format:Lnet106_96_v2s Extraction code: r5h2

Update on Feb 14, 2019: Reached 700 stars, release selected face detection models

ZQCNN format: Selected 6 Pnet, 2 Rnet, 2 Onet, 2 Lnet

Six Pnet Types	Input Size	Computation (excluding bbox)	Notes
Pnet20_v00	320x240	8.5 M	Benchmark: libfacedetection
Pnet20_v0	320x240	11.6 M	Benchmark: libfacedetection
Pnet20_v1	320x240	14.6 M
Pnet20_v2	320x240	18.4 M	Benchmark: original pnet
Pnet16_v0	256x192	7.5 M	stride=4
Pnet16_v1	256x192	9.8 M	stride=4

Two Rnet Types	Input Size	Computation	Notes
Rnet_v1	24x24	0.5 M	Benchmark: original Rnet
Rnet_v2	24x24	1.4 M

Two Onet Types	Input Size	Computation	Notes
Onet_v1	48x48	2.0 M	No landmark
Onet_v2	48x48	3.2 M	No landmark

Two Lnet Types	Input Size	Computation	Notes
Lnet_v2	48x48	3.5 M	lnet_basenum=16
Lnet_v2	48x48	10.8 M	lnet_basenum=32

Update on Jan 31, 2019: Reached 600 stars, release MTCNN head detection model

Trained on hollywoodheads data, the effect is average, barely usable

Head detection mtcnn-head mxnet-v0&zqcnn-v0

Update on Jan 24, 2019: Core modules support Linux

If you cannot compile completely following build-with-cmake.md, you can only compile ZQ_GEMM, ZQCNN, and other programs you want to test

Update on Jan 17, 2019

Modified ZQ_CNN_MTCNN.h

(1) When thread_num is set to less than 1 during init, Pnet_stage can be forced to run multi-threaded, which will be divided into blocks, preventing memory explosion for large images with small faces

(2) The size of rnet/onet/lnet can be other than 24/48/48, but only supports equal width and height

(3) rnet/onet/lnet batch processing can reduce memory usage when there are many faces

Update on Jan 15, 2019: Celebrate reaching 500 stars, release 106-point landmark model

mxnet format & zqcnn format

Update on Jan 4, 2019: Celebrate reaching 400 stars, release fast face model

mxnet format

zqcnn format

v3 version is not good enough, v4 version will be released later, roughly as shown in the diagram below

~~Update on Dec 25, 2018: Not open-sourced 106-point landmark~~

~~Life is quite tight, making some extra money.~~

~~landmark106-normal-1000.jpg is the landmark generated by model\det5-dw48-1000.nchwbin~~

~~landmark106-normal.jpg and landmark106-big.jpg are two models I trained that are not open-sourced~~

~~The normal model is 2.1M, computation 11.4M, PC single-thread takes 0.6-0.7ms, big model is 7.56M, computation 36.4M, PC single-thread takes 1.5-1.6ms~~

Update on Dec 20, 2018: Add MTCNN 106-point landmark model

Try it in SampleMTCNN (the released one is not very good, better ones are waiting to be sold)

SampleLnet106 has timing, single-thread takes about 0.6~0.7ms (E5-1650V4, 3.6GHz)

Update on Dec 3, 2018: Compile models into code

model2code in ZQCNN.sln can compile models into code

model2code.exe param_file model_file code_file prefix

Then add to your project

#include"code_file"

Use the following function to load the model

LoadFromBuffer(prefix_param, prefix_param_len, prefix_model, prefix_model_len)

Update on Nov 21, 2018

Support mxnet-ssd trained models, mean_val needs to be set to 127.5 to run correctly in SampleSSD.

But training with ReLU seems incorrect. I trained one with PReLU from scratch, only mAP=0.48, barely usable, Click to download.

After changing the model, you must first train a classification model with ImageNet, and then train SSD to improve mAP.

Update on Nov 14, 2018

(1) Optimized ZQ_GEMM, on a 3.6GHz machine MKL peak is about 46GFLOPS, ZQ_GEMM is about 32GFLOPS. Using ZQ_GEMM face model overall time is about 1.5 times that of using MKL.

Note: ZQ_GEMM compiled with VS2017 is faster than VS2015, but SampleMTCNN multi-threaded run is wrong (maybe different OpenMP support rules?).

(2) Very small weights can be removed when loading the model. When you find the model is much slower than expected, it is mostly caused by too small weight values.

Update on Nov 6, 2018

(1) Removed all omp multi-threaded code in layers, computation is too small, slower than single-threaded

(2) cblas_gemm can choose MKL, but the mkl in 3rdparty is very slow on my machine, the dll is quite large, I didn't put it in 3rdparty\bin, please download here.

Update 2 on Oct 30, 2018: MTCNN recommend using Gaussian filter first for finding small faces in large images

Update on Oct 30, 2018: BatchNorm eps issue

(1) The default eps for BatchNorm and BatchNormScale are both 0

(2) If using mxnet2zqcnn to convert models from mxnet, eps will be added to var as the new var during conversion

(3) If converting models from other platforms, either manually add eps to var, or add eps=? after BatchNorm and BatchNormScale (? is the eps value of this layer on that platform)

Note: To prevent division by zero, when dividing by var it is calculated as sqrt(__max(var+eps,1e-32)), which means if var+eps is less than 1e-32, it will be slightly different from the theoretical value. However, after today's modification, the LFW accuracy of the following face models is exactly the same as minicaffe's results.

Update on Oct 26, 2018

MTCNN supports multi-threading. For large images with small faces and many faces, 8 threads can achieve more than 4 times the effect of single thread. Please test with data\test2.jpg

Update on Oct 15, 2018

Improved MTCNN's nms strategy: 1. The local maximum of nms for each scale's Pnet must cover a certain number of non-maximum, the number is set in the parameters; 2. When Pnet's resolution is too large, nms performs block processing.

Update on Sep 25, 2018

Support insightface's GNAP, automatic model conversion using mxnet2zqcnn, see mxnet2zqcnn. You can try MobileFaceNet-GNAP

Update on Sep 20, 2018

(1) Updated the test method for tar-far accuracy of face recognition models. You can follow the steps How-to-evaluate-TAR-FAR-on-your-dataset to construct a test set and test model accuracy.

(2) According to (1), I cleaned CASIA-Webface and constructed two test sets webface1000X50, webface5000X20, and tested the accuracy of several main face recognition models I open-sourced.

Update on Sep 13, 2018

(1) Support loading models from memory

(2) Added compilation configuration ZQ_CNN_CompileConfig.h, you can choose whether to use _mm_fmadd_ps, _mm256_fmadd_ps (you can test the speed to see if it is faster or slower).

Update on Sep 12, 2018: Steps for training 112*96 (sphereface size) using insightface: InsightFace: how to train 112*96

Update on Aug 15, 2018

(1) Added natural scene text detection, model converted from TextBoxes. Personally, I think the speed is too slow and accuracy is not high.

Note that the PriorBoxLayer used in this project is different from the PriorBoxLayer in SSD. To export ZQCNN format weights, I modified deploy.prototxt and saved it as deploy_tmp.prototxt. Download the model from here.

(2) Added NSFW image detection, model converted from open_nsfw, I haven't tested how accurate it is.

Download the model from here.

Update on Aug 10, 2018

Successfully converted GenderAge-r50 model and Arcface-LResNet100E-IR from mxnet, same steps as converting MobileFaceNet model. See mxnet2zqcnn

The Model Zoo below has models I converted, which should be slightly faster than automatically converted ones.

Open ZQCNN.sln and run SampleGenderAge to see the effect. On my E5-1650V4 CPU, single-thread time fluctuates greatly, average is about 1900-2000ms, four threads is over 400ms.

Update on Aug 9, 2018

Added mxnet2zqcnn, successfully converted MobileFaceNet from mxnet to ZQCNN format (cannot guarantee other models can be converted successfully, ZQCNN does not support many layers yet). See mxnet2zqcnn

Update on Aug 7, 2018

Bug Fix: Previously Convolution, DepthwiseConvolution, InnerProduct, BatchNormScale/Scale default with_bias=true, now changed to default with_bias=false. That is, previous code cannot load these layers without bias.

Example: As shown below, a Layer like this would default to having bias_term before, now defaults to no bias_term

Convolution name=conv1 bottom=data top=conv1 num_output=10 kernel_size=3 stride=1

Update on Aug 6, 2018

Added face recognition accuracy testing on LFW database. Open ZQlibFaceID.sln to see related projects.

Since the calculation accuracy of C++ code is slightly different from matlab, the calculated accuracy also has some differences, but the difference is within 0.1%.

Update on Aug 3, 2018

Support multi-threading (accelerated through openmp). Please note, currently multi-threading is slower than single-threading

Update on Jul 26, 2018

Support MobileNet-SSD. For caffemodel conversion, refer to export_mobilenet_SSD_caffemodel_to_nchw_binary.m. You need to compile matcaffe. You can try this version caffe-ZQ

Update on Jun 5, 2018

Follow the trend, release source code. Forgot to mention that it depends on openblas. I directly used the version in mini-caffe, the one compiled by myself is very slow.

Model Zoo

Face Detection

MTCNN-author-version Converted format from MTCNN

MTCNN-ZQ-version

Face Recognition (unless otherwise specified, models are trained with ms1m-refine-v2)

Model	LFW Accuracy (ZQCNN)	LFW Accuracy (OpenCV3.4.2)	LFW Accuracy (minicaffe)	Time (ZQCNN)	Notes
MobileFaceNet-res2-6-10-2-dim128	99.67%-99.55%(matlab crop), 99.72-99.60%(C++ crop)	99.63%-99.65%(matlab crop), 99.68-99.70%(C++ crop)	99.62%-99.65%(matlab crop), 99.68-99.60%(C++ crop)	Time similar to dim256	Network structure same as dim256, only output dimension differs
MobileFaceNet-res2-6-10-2-dim256	99.60%-99.60%(matlab crop), 99.62-99.62%(C++ crop)	99.73%-99.68%(matlab crop), 99.78-99.68%(C++ crop)	99.55%-99.63%(matlab crop), 99.60-99.62%(C++ crop)	Single-thread ~21-22ms, four threads ~11-12ms, 3.6GHz	Network structure in download link, trained with faces_emore
MobileFaceNet-res2-6-10-2-dim512	99.52%-99.60%(matlab crop), 99.63-99.72%(C++ crop)	99.70%-99.67%(matlab crop), 99.77-99.77%(C++ crop)	99.55%-99.62%(matlab crop), 99.62-99.68%(C++ crop)	Time similar to dim256	Network structure same as dim256, only output dimension differs. Thanks to moli for training this model

Model	LFW Accuracy (ZQCNN)	LFW Accuracy (OpenCV3.4.2)	LFW Accuracy (minicaffe)	Time (ZQCNN)	Notes
MobileFaceNet-res4-8-16-4-dim128	99.72%-99.72%(matlab crop), 99.72-99.68%(C++ crop)	99.82%-99.83%(matlab crop), 99.80-99.78%(C++ crop)	99.72%-99.72%(matlab crop), 99.72-99.68%(C++ crop)	Time similar to dim256	Network structure same as dim256, only output dimension differs
MobileFaceNet-res4-8-16-4-dim256	99.78%-99.78%(matlab crop), 99.75-99.75%(C++ crop)	99.82%-99.82%(matlab crop), 99.80-99.82%(C++ crop)	99.78%-99.78%(matlab crop), 99.73-99.73%(C++ crop)	Single-thread ~32-33ms, four threads ~16-19ms, 3.6GHz	Network structure in download link, trained with faces_emore
MobileFaceNet-res4-8-16-4-dim512	99.80%-99.73%(matlab crop), 99.85-99.83%(C++ crop)	99.83%-99.82%(matlab crop), 99.87-99.83%(C++ crop)	99.80%-99.73%(matlab crop), 99.85-99.82%(C++ crop)	Time similar to dim256	Network structure same as dim256, only output dimension differs. Thanks to moli for training this model

Model\Test Set webface1000X50	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6	thresh@ FAR=1e-5	TAR@ FAR=1e-5
MobileFaceNet-res2-6-10-2-dim128	0.78785	9.274%	0.66616	40.459%	0.45855	92.716%
MobileFaceNet-res2-6-10-2-dim256	0.77708	7.839%	0.63872	40.934%	0.43182	92.605%
MobileFaceNet-res2-6-10-2-dim512	0.76699	8.197%	0.63452	38.774%	0.41572	93.000%
MobileFaceNet-res4-8-16-4-dim128	0.79268	9.626%	0.65770	48.252%	0.45431	95.576%
MobileFaceNet-res4-8-16-4-dim256	0.76858	9.220%	0.62852	46.195%	0.40010	96.929%
MobileFaceNet-res4-8-16-4-dim512	0.76287	9.296%	0.62555	44.775%	0.39047	97.347%

Model\Test Set webface5000X20	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6	thresh@ FAR=1e-5	TAR@ FAR=1e-5
MobileFaceNet-res2-6-10-2-dim128	0.70933	29.558%	0.51732	85.160%	0.45108	94.313%
MobileFaceNet-res2-6-10-2-dim256	0.68897	28.376%	0.48820	85.278%	0.42386	94.244%
MobileFaceNet-res2-6-10-2-dim512	0.68126	27.708%	0.47260	85.840%	0.40727	94.632%
MobileFaceNet-res4-8-16-4-dim128	0.71238	32.153%	0.51391	89.525%	0.44667	96.583%
MobileFaceNet-res4-8-16-4-dim256	0.68490	30.639%	0.46092	91.900%	0.39198	97.696%
MobileFaceNet-res4-8-16-4-dim512	0.67303	32.404%	0.45216	92.453%	0.38344	98.003%

Model\Test Set TAO ids:6606,ims:87210	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6	thresh@ FAR=1e-5	TAR@ FAR=1e-5
MobileFaceNet-res2-6-10-2-dim128	0.92204	01.282%	0.88107	06.837%	0.78302	41.740%
MobileFaceNet-res2-6-10-2-dim256	0.91361	01.275%	0.86750	07.081%	0.76099	42.188%
MobileFaceNet-res2-6-10-2-dim512	0.90657	01.448%	0.86061	07.299%	0.75488	41.956%
MobileFaceNet-res4-8-16-4-dim128	0.92098	01.347%	0.88233	06.795%	0.78711	41.856%
MobileFaceNet-res4-8-16-4-dim256	0.90862	01.376%	0.86397	07.083%	0.75975	42.430%
MobileFaceNet-res4-8-16-4-dim512	0.90710	01.353%	0.86190	06.948%	0.75518	42.241%

Model\Test Set ZQCNN-Face_5000_X_20	thresh@ FAR=1e-8	TAR@ FAR=1e-8	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6
MobileFaceNet-GNAP	0.73537	11.722%	0.69903	20.110%	0.65734	33.189%
MobileFaceNet-res2-6-10-2-dim128	0.64772	40.527%	0.60485	55.345%	0.55571	70.986%
MobileFaceNet-res2-6-10-2-dim256	0.61647	42.046%	0.57561	55.801%	0.52852	70.622%
MobileFaceNet-res2-6-10-2-dim512	0.59725	44.651%	0.55690	58.220%	0.51134	72.294%
MobileFaceNet-res4-8-16-4-dim128	0.64519	47.735%	0.60247	62.882%	0.55342	77.777%
MobileFaceNet-res4-8-16-4-dim256	0.58229	56.977%	0.54582	69.118%	0.49763	82.161%
MobileFaceNet-res4-8-16-4-dim512	0.58296	54.731%	0.54219	68.613%	0.49174	82.812%
MobileFaceNet-res8-16-32-8-dim512	0.58058	61.826%	0.53841	75.281%	0.49098	86.554%

Model\Test Set ZQCNN-Face_5000_X_20	thresh@ FAR=1e-8	TAR@ FAR=1e-8	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6
ArcFace-r34-v2(Not trained by me)	0.61953	47.103%	0.57375	62.207%	0.52226	76.758%
ArcFace-r50 (ms1m-refine-v1, not trained by me)	0.61299	50.594%	0.56658	65.757%	0.51637	79.207%
ArcFace-r100 (Not trained by me)	0.57350	67.434%	0.53136	79.944%	0.48164	90.147%

Model\Test Set ZQCNN-Face_12000_X_10-40	thresh@ FAR=1e-8	TAR@ FAR=1e-8	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6
MobileFaceNet-res2-6-10-2-dim128	0.64507	39.100%	0.60347	53.638%	0.55492	69.516%
MobileFaceNet-res2-6-10-2-dim256	0.61589	39.864%	0.57402	54.179%	0.52596	69.658%
MobileFaceNet-res2-6-10-2-dim512	0.60030	41.309%	0.55806	55.676%	0.50984	70.979%
MobileFaceNet-res4-8-16-4-dim128	0.64443	45.764%	0.60060	61.564%	0.55168	76.776%
MobileFaceNet-res4-8-16-4-dim256	0.58879	52.542%	0.54497	67.597%	0.49547	81.495%
MobileFaceNet-res4-8-16-4-dim512	0.58492	51.752%	0.54085	67.104%	0.49010	81.836%
MobileFaceNet-res8-16-32-8-dim512	0.58119	61.412%	0.53700	75.520%	0.48997	86.647%

Model\Test Set ZQCNN-Face_12000_X_10-40	thresh@ FAR=1e-8	TAR@ FAR=1e-8	thresh@ FAR=1e-7	TAR@ FAR=1e-7	thresh@ FAR=1e-6	TAR@ FAR=1e-6
ArcFace-r34-v2 (Not trained by me)	0.61904	45.072%	0.57173	60.964%	0.52062	75.789%
ArcFace-r50(ms1m-refine-v1, not trained by me)	0.61412	48.155%	0.56749	63.676%	0.51537	78.138%
ArcFace-r100 (Not trained by me)	0.57891	63.854%	0.53337	78.129%	0.48079	89.579%

For more face models, please see Model-Zoo-for-Face-Recognition

Facial Expression Recognition

FacialEmotion Seven types of expressions trained with Fer2013

Gender and Age Recognition

GenderAge-ZQ Model trained using train-GenderAge

Object Detection

MobileNetSSD Converted format from MobileNet-SSD

MobileNetSSD-Mouth Used for SampleDetectMouth

Text Detection

TextBoxes Converted format from TextBoxes

NSFW Image Detection

NSFW Converted format from open_nsfw

(1)How much precision is lost storing face feature vectors as integers?

(2)Accelerating similarity calculation for tens of millions of face feature vectors?

(3)Building a Forward library faster than mini-caffe

(4)Precision issues in vector dot product

(5)ZQCNN supports Depthwise Convolution and modified SphereFaceNet-10 with MobileNet

(6)Following the trend, releasing some source code

(7)ZQCNN supports SSD, about 30% faster than mini-caffe

(8)ZQCNN's SSD supports freely changing resolution for the same model

(9)99.78% accuracy face recognition model in ZQCNN format

(10)ZQCNN adds face recognition testing code on LFW dataset

(11)Embracing mxnet, starting to write mxnet2zqcnn

(12)Large-scale face test set, and how to build your own face test set

(13)Matrix description of regular convolution, MobileNet convolution, and global average pooling

(14)ZQ_FastFaceDetector: faster and more accurate face detection library

Android Compilation Instructions

Modify the ndk path and opencv Android SDK path in build.sh
Modify CMakeLists.txt

From original:

#add_definitions(-march=native) add_definitions(-mfpu=neon) add_definitions(-mfloat-abi=hard)

Change to:

#add_definitions(-march=native) add_definitions(-mfpu=neon) add_definitions(-mfloat-abi=softfp)
This way you should be able to compile two libraries ZQ_GEMM and ZQCNN. If you want to compile SampleMTCNN, you can modify the parts that cannot be compiled according to the error prompts, mainly OpenMP and timing functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction

Main Development Environment: VS2015 with Update 3

Core Modules Support Linux:

Core Modules Support ARM-Linux:

Training Related

Update Log

Model Zoo

Related Articles

FilesExpand file tree

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Introduction

Main Development Environment: VS2015 with Update 3

Core Modules Support Linux:

Core Modules Support ARM-Linux:

Training Related

Update Log

Model Zoo

Related Articles