|
|
|
|
|
An Improved Speech Emotion Classification Approach Based on Optimal Voiced Unit |
|
PP: 1001-1011 |
|
doi:10.18576/isl/110401
|
|
Author(s) |
|
Reda Elbarougy,
Noha M El-Badry,
Mona Nagy ElBedwehy,
|
|
Abstract |
|
Emotional speech recognition (ESR) has significant role in human-computer interaction. ESR methodology involves audio segmentation for selecting units to analyze, extract features relevant to emotion, and finally perform a classification process. Previous research assumed that a single utterance was the unit of analysis. They believed that the emotional state remained constant during the utterance, even though the emotional state could change over time, even within a single utterance. As a result, using an utterance as a single unit is ineffective for this purpose. The study’s goal is to discover a new voiced unit that can be utilized to improve ESR accuracy. Several voiced units based on voiced segments were investigated. To determine the best-voiced unit, each unit is evaluated using an ESR based on a support vector machine classifier. The proposed method was validated using three datasets: EMO-DB, EMOVO, and SAVEE. Experimental results revealed that a voiced unit with five-voiced segments has the highest recognition rate. The emotional state of the overall utterance is decided by a majority vote of its parts’ emotional states. The proposed method outperforms the traditional method in terms of classification outcomes. EMO-DB, EMOVO, and SAVEE improve their recognition rates by 12%, 27%, and 23%, respectively.
|
|
|
|
|
|