Resource-Efficient Deep Models for Embedded Systems
- 2016 — 2020
- FWF DACH Project
- Ruprecht-Karls University of Heidelberg, Holger Fröning
- Research Areas
Deep representation learning is one of the main factors for the recent performance boost in many image, signal and speech processing problems. This is particularly true when having big amounts of data and almost unlimited computing resources available as demonstrated in competitions such as for example ImageNet. However, in real-world scenarios the computing infrastructure is often restricted and the computational requirements are not fulfilled. In this research proposal we suggest several directions for reducing the computational burden, i.e. the number of arithmetic operations, while maintaining the level of recognition performance.
Today, advanced embedded CPUs have reached an architectural feature set that supports native cross-compilation of numerical algorithms. One of the most often used embedded CPU architectures is the ARM Cortex-A9. First hybrid devices integrating CPU and FPGA in a single package are available (Xilinx Zynq, Altera Cyclone V SoC), providing significant computational performance at very low power budgets for certain tasks. It remains unclear, however, how existing CPU software stacks can be extended to exploit this heterogeneity for improved performance and energy efficiency. To support this heterogeneity in compilation processes, new tools are required.
These two research directions in combination enable using deep models in mobile devices and embedded systems with limited power-consumption and computational resources. To achieve this, the focus is four-fold: (1) Sparse connectivity and activity in the models; We aim to use sparse weight matrices and sparsity enforcing activation functions to reduce the number of arithmetic operations. (2) Finite-precision analysis of deep models; In e.g. hearing aids well-performing simple classifiers are necessary for acoustic scene classification. In particular, we perform performance analysis of the classifiers and investigate reduced-precision learning behavior. Another interesting aspect is if the models can be scaled to the integer domain requiring only integer arithmetic. Finite-precision analysis determines the optimal bit-width for the arithmetic operations, while still maintaining the performance of the model. (3) Automated code synthesis of the deep models to embedded systems such as hybrid ARM+FPGA architectures based on Theano. The aim is to exploit sparsity and insights from finite-precision analysis and asynchronous computations to obtain efficient models for such embedded hardware, and to apply automated partitioning, compilation and synthesis techniques to such hybrid architectures. (4) Developed methods are empirically compared in benchmark image classification problems and in two speech processing tasks, i.e. single channel source separation and artificial bandwidth extension. The key properties of interest are reduced-precision behavior, influence of sparsity, power consumption and energy efficiency, and optimized performance on embedded heterogeneous hardware while hiding heterogeneity from the user.