Robust speech recognition is a crucial function of interactive service robot. The main objective of this project is to develop vision-based beamformer technology and achieve real-time high-quality voice capturing to support natural language interaction. The visual information through stereo camera is used to build 3D facial model by which the mouth position of user can be located. Beam former will be parameterized and optimized based the solid angle and distance towards mouth location. The research approach consists of visual-audio synchronization and speech processing. Matching the stereoscopic images features, a 3D point cloud can be extracted through the image processor. A time-of-flight (TOP) depth camera will be added as a complementary sensor to adapt different interaction scenario. Beamformer steers the finite impulse response (FIR) filter coefficients and makes the array pattern optimized to the target co-ordinate. The critical issue of the proposed technology is to build an alignment with the beamformer filter coefficients and image frame so that a compilation algorithm will be developed to achieve visual-audio synchronization. The system performance will be demonstrated in using service robot platform.
R&D Project Database
            Adaptive Voice Localization System For Service Robot
            
            | Overview | 
More information
| Project Reference | ITP/054/19LP | 
| Hosting Institution | LSCM R&D Centre (LSCM) | 
| Project Coordinator | Mr Martin Chun-Wai Lai | 
| Approved Funding Amount | HK$2.79M | 
| Project Period | 24 Feb 2020 - 16 May 2021 | 






