|
|
|
|
|
Multimodal Arabic Speech Recognition for Human-Robot Interaction Applications |
|
PP: 2885-2897 |
|
Author(s) |
|
Alaa Sagheer,
|
|
Abstract |
|
By the earliest motivation of building humanoid robot to take care of human being in the daily life, the researches of
robotics have been developed several systems over the recent decades. One of the challenges faces humanoid robots is its capability to
achieve audio-visual speech communication with people, which is known as human-robot interaction (HRI). In this paper, we propose
a novel multimodal speech recognition system can be used independently or to be combined with any humanoid robot. The system
is multimodal since it includes audio speech module, visual speech module, face and mouth detection and user identification all in
one framework runs on real time. In this framework, we use the Self Organizing Map (SOM) in feature extraction tasks and both
the k-Nearest Neighbor and the Hidden Markov Model in feature recognition tasks. Results from experiments are undertaken on a
novel Arabic database, developed by the author, includes 36 isolated words and 13 casual phrases gathered by 50 Arabic subjects.
The experimental results show how the acoustic cue and the visual cue enhance each other to yield an effective audio-visual speech
recognition (AVSR) system. The proposed AVSR system is simple, promising and effectively comparable with other reported systems. |
|
|
|
|
|