innerpage Project Funding
PROJECT & FUNDING SCHEMES
R&D Project Database
Adaptive Voice Localization System For Service Robot

Print

Overview

Robust speech recognition is a crucial function of interactive service robot. The main objective of this project is to develop vision-based beamformer technology and achieve real-time high-quality voice capturing to support natural language interaction. The visual information through stereo camera is used to build 3D facial model by which the mouth position of user can be located. Beam former will be parameterized and optimized based the solid angle and distance towards mouth location. The research approach consists of visual-audio synchronization and speech processing. Matching the stereoscopic images features, a 3D point cloud can be extracted through the image processor. A time-of-flight (TOP) depth camera will be added as a complementary sensor to adapt different interaction scenario. Beamformer steers the finite impulse response (FIR) filter coefficients and makes the array pattern optimized to the target co-ordinate. The critical issue of the proposed technology is to build an alignment with the beamformer filter coefficients and image frame so that a compilation algorithm will be developed to achieve visual-audio synchronization. The system performance will be demonstrated in using service robot platform.

More information

Project Reference ITP/054/19LP
Hosting Institution LSCM R&D Centre (LSCM)
Project Coordinator Mr Martin Chun-Wai Lai
Approved Funding Amount HK$2.79M
Project Period 24 Feb 2020 - 16 May 2021