Adaptive Voice Localization System For Service Robot | R&D Project Database | Logistics and Supply Chain MultiTech R&D Centre


Overview	Robust speech recognition is a crucial function of interactive service robot. The main objective of this project is to develop vision-based beamformer technology and achieve real-time high-quality voice capturing to support natural language interaction. The visual information through stereo camera is used to build 3D facial model by which the mouth position of user can be located. Beam former will be parameterized and optimized based the solid angle and distance towards mouth location. The research approach consists of visual-audio synchronization and speech processing. Matching the stereoscopic images features, a 3D point cloud can be extracted through the image processor. A time-of-flight (TOP) depth camera will be added as a complementary sensor to adapt different interaction scenario. Beamformer steers the finite impulse response (FIR) filter coefficients and makes the array pattern optimized to the target co-ordinate. The critical issue of the proposed technology is to build an alignment with the beamformer filter coefficients and image frame so that a compilation algorithm will be developed to achieve visual-audio synchronization. The system performance will be demonstrated in using service robot platform.

Overview

Robust speech recognition is a crucial function of interactive service robot. The main objective of this project is to develop vision-based beamformer technology and achieve real-time high-quality voice capturing to support natural language interaction. The visual information through stereo camera is used to build 3D facial model by which the mouth position of user can be located. Beam former will be parameterized and optimized based the solid angle and distance towards mouth location. The research approach consists of visual-audio synchronization and speech processing. Matching the stereoscopic images features, a 3D point cloud can be extracted through the image processor. A time-of-flight (TOP) depth camera will be added as a complementary sensor to adapt different interaction scenario. Beamformer steers the finite impulse response (FIR) filter coefficients and makes the array pattern optimized to the target co-ordinate. The critical issue of the proposed technology is to build an alignment with the beamformer filter coefficients and image frame so that a compilation algorithm will be developed to achieve visual-audio synchronization. The system performance will be demonstrated in using service robot platform.


Project Reference	ITP/054/19LP
Hosting Institution	LSCM R&D Centre (LSCM)
Project Coordinator	Mr Martin Chun-Wai Lai
Approved Funding Amount	HK$2.79M
Project Period	24 Feb 2020 - 16 May 2021

Project Reference

ITP/054/19LP

Hosting Institution

LSCM R&D Centre (LSCM)

Project Coordinator

Mr Martin Chun-Wai Lai

Approved Funding Amount

HK$2.79M

Project Period

24 Feb 2020 - 16 May 2021