Publication Date: 2022/10/17
Abstract: The goal of the project is to detect the speaker's emotions while he or she speaks. Speech generated under a condition of fear, rage or delight, for example, becomes very loud and fast, with a larger and more varied pitch range, However, in a moment of grief or tiredness, speech is slow and low-pitched. Voice and speech patterns can be used to detect human emotions, which can help improve human-machine interactions. We give Deep Neural Networks CNN, Support Vector Machine, and MLP Classification based on auditory data for emotion produced by speech, such as Mel Frequency Cepstral Coefficient classification model (MFCC).Eight different emotions have been taught to the model (neutral, calm, happy, sad, angry, fearful, disgust, surprise), Using the RAVDESS (Ryerson AudioVisual Database of Emotional Speech and Song) dataset as well as the TESS (Toronto Emotional Speech Set) dataset, we found that the proposed approach achieves accuracies of 86 percent, 84 percent, and 82 percent, respectively, for eight emotions using CNN, MLP Classifier, and SVM Classifiers.
Keywords: No Keywords Available
DOI: https://doi.org/10.5281/zenodo.7215574
PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT22SEP761_(1).pdf
REFERENCES