Autonoumus vehicle system can be summarized as three parts: perception, planning and control. Among them, perception is the basis of system safety and intelligence. In recent years, with the development of neural networks, the accuracy of perception tasks has improved dramatically, but at the same time the network computational complexity and model size have also become larger. In order to support a variety of high-precision perceptual computing tasks on an in-vehicle system with limited computing power, and to meet the needs of unmanned driving for perceptual accuracy and real-time performance. The team explored CPU / GPU collaborative compilation and parallel optimization technologies from different levels of heterogeneous tasks, homogeneous multi-tasks and single tasks
Publication
MPInfer: Multi-Tenant Parallel CNN Inference Framework for Autonomous Driving, Yitong Huang, Yu Zhang, Boyuan Feng, Xing Guo, Yanyong Zhang, Yufei Ding.
MPInfer is a multi-task inference framework, designed to support multiple perception tasks in autonomous driving with multi-sensors. Based on TensorRT, MPInfer uses multiple schedule strategies to optimize the CNN networks.
Fast Schedule Tensor Computation with High Data Reuse and Device Utilization. Yuxiang Zhang, Yu Zhang*, 2019 IEEE International Symposium on Parallel and Distributed Processing with Applications Workshops (ISPA)。
This work proposes an algorithm that can efficiently find a promising schedule to exploit the parallelism and locality of computation on GPU. In particular, an empirical model comprehensively considering locality, load balance and parallelism sufficiency of computation on the given GPU model are designed to measure the quality of a candidate schedule. And empirically constraints are introduced to significantly reduce the searching space of schedule to polynomial complexity in terms of computation dimensions. Compared with the state-of-the-art tool, Tensor Comprehensions, our algorithm can find a promising schedule 5-45x faster, and the corresponding scheduled code runs 1.5-10x faster.