Abstract:
Recently, speech recognition has been one of the most necessary domains in machine learning and deep learning. People tend to order a machine by speech more comfortably and so this field has emerged to fulfill this necessity. In this paper, we proceed towards a feature engineering-based approach for detecting emotion in speech domain. Handcrafted features from multiple audio files are inclined to feed into learning models Features extracted from text domain are also included for resolving ambiguity. We follow both machine learning and deep learning-based approach where extracted features are fed into six machine learning classifiers namely, Random Forest, Multinomial Naïve Bayes, Support Vector Machine, Logistic Regression and Gradient Boosting whereas deep learning consists of feed-forward neural network and LSTM based classifiers for feature extraction. For both domains, eight hand-crafted features are extracted. Finally, we compare machine learning model to deep learning model using accuracy, precision, recall and F-measure where we observe that shallow machine learning models achieve higher performance than renowned deep learning models for recognizing emotion.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh.