Lightweight neural network hand gesture recognition method for embedded platforms
-
摘要: 针对传统基于图像分割和特征提取的手势识别算法在复杂背景下识别准确率低、灵活性差的问题,基于目标检测神经网络的手势识别算法可以有效提高复杂环境下手势识别的准确性。受嵌入式处理器体积和功耗的限制,常用的目标检测神经网络在嵌入式上的识别速度较低,不能满足实时手势识别的要求。在SSD目标检测的基础上对其进行优化,使用MobileNetv3网络实现特征提取,目标检测方面则是使用SSD-lite结构,其使用深度可分离卷积替代普通卷积,实现了轻量化MobileNetv3-SSDLite手势识别算法的设计。针对手势识别的要求,制作了包含不同手势的数据集,利用它在服务器上完成了模型的训练。为了满足嵌入式的算力限制,通过模型的量化压缩将float64的网络参数量化为int8,并压缩网络结构,提高网络在嵌入式上的推理速度,实现基于嵌入式的手势识别。实验结果表明,基于嵌入式的MobileNetv3-SSDLite手势识别算法可以达到平均准确率99.61%,且识别速度达到每秒50帧以上,满足实时手势识别的要求。
-
关键词:
- 手势识别 /
- 深度神经网络 /
- 嵌入式 /
- 轻量化 /
- MobileNev3-SSDLite
Abstract: Compared with the traditional gesture recognition algorithms based on image segmentation and feature extraction in complex backgrounds which have low recognition accuracy and poor flexibility, the gesture recognition algorithm based on target detection neural network can effectively improve the accuracy of gesture recognition in complex environments. Restricted by the size and power consumption of embedded processors, the recognition speed of commonly used target detection neural networks on embedded processors is low and cannot meet the requirements of real-time gesture recognition. In this paper, we optimize the SSD target detection and use MobileNetv3 network to achieve feature extraction and SSD-lite structure for target detection, thus to use depth-separable convolution instead of ordinary convolution to realize the design of lightweight MobileNetv3-SSDLite gesture recognition algorithm. For the requirements of gesture recognition, we make a dataset containing different gestures and complete the training of the model on the server using the dataset. In order to meet the arithmetic limitation of embedded processor, we quantize the float64 network parameters into int8 by quantization compression of the model, and compress the network structure to improve the inference speed of the network on embedded processor to realize the embedded-based gesture recognition. The experimental results show that the embedded-based MobileNetv3-SSDLite gesture recognition algorithm can achieve an average accuracy of 99.61% and a recognition speed of above 50 frame/s, which meets the requirements of real-time gesture recognition.-
Key words:
- hand gesture recognition /
- deep neuron network /
- embedded system /
- lightweight /
- MobileNetv3-SSDLite
-
表 1 MobileNet系列与VGG16的对比
Table 1. MobileNet series comparison to VGG16
network structure params/Mbyte MACs/106 ImageNet
accuracy/%VGG16 13.8 15300 71.5 MobieNetv1 4.2 569 70.6 MobileNetv2 3.4 300 72.0 MobileNetv3 5.4 219 75.2 表 2 用于检测的额外特征图及其尺寸
Table 2. Extra feature map layers for object detection
extra layers shape layer 1 $39 \times 39 \times 512$ layer 2 $19 \times 19 \times 1024$ layer 3 $10 \times 10 \times 512$ layer 4 $5 \times 5 \times 256$ layer 5 $3 \times 3 \times 256$ layer 6 $1 \times 1 \times 256$ 表 3 SSDLite深层检测网络与SSD的对比
Table 3. SSDLite detection head comparison to SSD
network structure params/Mbyte MACs/106 mAP/% SSD 14.8 1250 19.3 SSDLite 2.1 350 22.2 表 4 不同手势的识别结果
Table 4. Recognition results of hand gestures
hand gesture accuracy/% 0 99.64 1 100.00 3 99.51 4 99.22 5 99.69 average 99.61 表 5 不同场景下手势识别结果
Table 5. Recognition results of hand gestureson various scenarios
scenarios average accuracy/% multiple hand gestures 96 complicated background 64 low light intensity 72 表 6 不同手势识别算法的比较
Table 6. Comparison of different hand gesture recognition algorithms.
algorithm params/Mbyte MACs/106 frame rate/(frame/s) mean accuracy/% VGG16-SSD 24.3 30654 2 91.75 MobieNetv1-SSD 7.2 1299 12 93.98 MobileNetv1-SSDLite 4.1 1130 16 93.86 MobileNetv2-SSDLite 3.1 656 36 91.01 MobileNetv3-SSDLite 2.2 526 58 99.61 -
[1] 陈壮炼, 林晓乐, 王家伟, 等. 基于卷积神经网络的手势识别人机交互系统的设计[J]. 现代计算机, 2021(6):57-62. (Chen Zhuanglian, Lin Xiaole, Wang Jiawei, et al. Design of human-computer interaction system for gesture recognition based on convolutional neural network[J]. Modern Computer, 2021(6): 57-62 doi: 10.3969/j.issn.1007-1423.2021.06.011 [2] 袁博, 查晨东. 手势识别技术发展现状与展望[J]. 科学技术创新, 2018(32):95-96. (Yuan Bo, Zha Chendong. Gesture recognition technology development status and outlook[J]. Scientific and Technological Innovation, 2018(32): 95-96 doi: 10.3969/j.issn.1673-1328.2018.32.056 [3] 时梦丽, 张备伟, 刘光徽. 基于深度图像的实时手势识别方法[J]. 计算机工程与设计, 2020, 41(7):2057-2062. (Shi Mengli, Zhang Beiwei, Liu Guanghui. Real-time gesture recognition method based on depth image[J]. Computer Engineering and Design, 2020, 41(7): 2057-2062 [4] 彭理仁, 王进, 林旭军, 等. 一种基于深度图像的静态手势神经网络识别方法[J]. 自动化与仪器仪表, 2020(1):6-9,15. (Peng Liren, Wang Jin, Lin Xujun, et al. A static gesture recognition method based on depth image and neural network[J]. Automation & Instrumentation, 2020(1): 6-9,15 [5] 吴轶凡, 郭剑辉. 一种基于肤色模型的改进型手势分割算法的实现[J]. 电子设计工程, 2020, 28(18):185-188,193. (Wu Yifan, Guo Jianhui. Implementation of an improved gesture segmentation algorithm based on skin color model[J]. Electronic Design Engineering, 2020, 28(18): 185-188,193 [6] Li Hui, Yang Lei, Wu Xiaoyu, et al. Static hand gesture recognition based on HOG with Kinect[C]//Proceedings of the 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics. 2012: 271-273. [7] Liua C, Zhou Shuwang, Hu Sheng, et al. Hand gesture recognition based on sEMG signal and improved SVM voting method[C]//Proceedings of the 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE). 2020: 605-608. [8] 石雨鑫, 邓洪敏, 郭伟林. 基于混合卷积神经网络的静态手势识别[J]. 计算机科学, 2019, 46(s1):165-168. (Shi Yuxin, Deng Hongmin, Guo Weilin. Static gesture recognition based on hybrid convolution neural network[J]. Computer Science, 2019, 46(s1): 165-168 [9] Hussain S, Saxena R, Han Xie, et al. Hand gesture recognition using deep learning[C]//Proceedings of the 2017 International SoC Design Conference (ISOCC). 2017: 48-49. [10] 郭紫嫣, 韩慧妍, 何黎刚, 等. 基于改进的YOLOV4的手势识别算法及其应用[J]. 中北大学学报(自然科学版), 2021, 42(3):223-231. (Guo Ziyan, Han Huiyan, He Ligang, et al. Gesture recognition algorithm and application based on improved YOLOV4[J]. Journal of North University of China (Natural Science Edition), 2021, 42(3): 223-231 [11] Chhajed R R, Parmar K P, Pandya M D, et al. Messaging and video calling application for specially abled people using hand gesture recognition[C]//Proceedings of the 2021 6th International Conference for Convergence in Technology (I2CT). 2021: 1-4. [12] Yi Chengming, Zhou Liguang, Wang Zhixiang, et al. Long-range hand gesture recognition with joint SSD network[C]//Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO). 2018: 1959-1963. [13] 孔维刚, 李文婧, 王秋艳, 等. 基于改进YOLOv4算法的轻量化网络设计与实现[J/OL]. 计算机工程, 1-10(2021-04-30)Kong Weigang, Li Wenjing, Wang Qiuyan, et al. Design and implementation of lightweight network based on YOLOv4 algorithm[J/OL]. Computer Engineering, 1-10(2021-04-30). https://doi.org/10.19678/j.issn.1000-3428.0060948 [14] Liu Wei, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. 2016: 21-37. [15] Ren Shaoqing, He Kaiming, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 [16] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016: 779-788. [17] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[DB/OL]. arXiv preprint arXiv: 1409.1556, 2014. [18] Howard A, Sandler M, Chen Bo, et al. Searching for MobileNetV3[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019: 1314-1324. [19] Howard A G, Zhu Menglong, Chen Bo, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[DB/OL]. arXiv preprint arXiv: 1704.04861, 2017. [20] Sandler M, Howard A, Zhu Menglong, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 4510-4520. [21] Hu Jie, Shen Li, Albanie S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. doi: 10.1109/TPAMI.2019.2913372 [22] 杨国威, 许志旺, 房臣, 等. 融合剪枝与量化的目标检测网络压缩方法[J/OL]. 计算机工程与应用, 1-12[2021-12-17]Yang Guowei, Xu Zhiwang, Fang Chen, et al. Object detection network compression method based on pruning and quantization[J/OL]. Computer Engineering and Applications, 1-12[2021-12-17]. http://kns.cnki.net/kcms/detail/11.2127.tp.20210918.1121.008.html