|
 |
|
|
|
Exploring Deep Learning Methods for Audio Speech Emotion Detection: An Ensemble MFCCs, CNNs and LSTM |
|
PP: 75-85 |
|
doi:10.18576/amis/190107
|
|
Author(s) |
|
Shaik Abdul Khalandar Basha,
P. M. Durai Raj Vincent,
Suleiman Ibrahim Mohammad,
Asokan Vasudevan,
Eddie Eu Hui Soon,
Qusai Shambour,
Muhammad Turki Alshurideh,
|
|
Abstract |
|
Our world relies entirely on the gadgets we use every day, making the world heavily materialized.The human-machine interactions that are currently available are not supported under line-of-sight (LOS). The proposed emotional communication is based on non-line-of-sight (NLOS) to break away from Conventional human-machine interactions. This emotional communication is defined as interactive, similar to the usual video and voice media we use daily; similarly, the information is transmitted over long distances. We proposed the EAS framework, another ensemble technique for an emotional communication protocol for real-time communication requirements. This framework supports the communication of emotional realization. They also designed. Finally, which are developing CNN-LSTM architectures for feature extraction, implementing an attention mechanism for selecting relevant features, creating for selecting relevant features, and creating for real-time scenarios, performance-evaluated matrices are applied CNN-LSTM networks with and without attention mechanisms. DCCA feature extraction is used to extract attributes and find correlations among different labels in the dataset. Toanalyze the real-time performance of the process in emotional communication with long-distance communications between others. The proposed CNN-LSTM model achieves the highest accuracy with 87.08% accuracy, while existing models, such as CNN baseline and LSTM models, showed 81.11% and 84.01%, respectively. Our approach shows improved Accuracy compared to existing works, especially for real-time applications.
|
|
|
 |
|
|