Abstract:
The high incidence of thyroid diseases has increased globally. Thyroid disease is a widespread issue affecting huge human populations. Thyroid problems are chronic, and patients with thyroid abnormalities can lead stable, normal lives if their conditions are adequately managed. The thyroid gland is a small, front-of-the-neck organ that wraps around the windpipe (trachea). This thyroid gland is one of the body's most vital organs. The secretions of thyroid hormone releases are responsible for regulating the metabolism. Hyperthyroidism and hypothyroidism are two frequent chronic conditions of the thyroid that regulate the rate of the body's metabolism by releasing thyroid hormones. When your thyroid is not functioning properly, your entire body might be affected. Hyperthyroidism is the medical term for a condition in which the body produces an excessive amount of thyroid hormone. Hypothyroidism can develop if our body produces too little thyroid hormone.
In this thesis, we tried to cover the thyroid disease prediction, analysis and prediction is compared to other similar research who worked in the area of prediction of thyroid disease. The thyroid disease prediction was implemented using two approaches, the first one is Machine Learning, and the second approach is Deep Learning. Five algorithms, including logistic regression, decision tree, support vector machine (linear kernel), support vector machine (RBF Kernel), and Random Forest algorithms, were compared from a variety of machine learning methodologies to predict and evaluate their performance in terms of accuracy. Recurrent neural networks (RNN) are one of the most effective Deep Learning algorithms for learning complex structured data. This study illustrated how to implement logistic regression, decision tree, support vector machine (linear kernel), support vector machine (RBF Kernel), and Random Forest in order to predict thyroid disease. Thyroid data set of machine learning and deep learning repositories has been used for this purpose. The performance of Machine Learning algorithms decision tree and Random Forest approach gives a maximum accuracy of 98.16% which is very good as compared to the other existing algorithms. Using the same performance matrices, the performance of the Deep learning algorithm RNN was compared to that of other Machine Learning algorithms such as logistic regression, decision tree, support vector machine (linear kernel), support vector machine (RBF Kernel), and Random Forest. RNN outperforms over the other algorithms Logistic Regression, Support Vector Machine (Linear Kernel) and Support Vector Machine (RBF Kernel) with optimum prediction accuracy of 97%.
Many researchers have developed numerous approaches for the disease's diagnosis, as well as a number of disease prediction models. As in various other fields, machine learning plays a crucial part in the process of disease prediction, which aims for near-perfect accuracy to 100%. As a result, there has been a rise through interest in applying machine learning techniques to the modeling of health care issues. The development of artificial intelligence (AI) approaches is also a powerful resource that may help in the awareness of thyroid disorders, but the diagnostic accuracy is questionable.
Confusion matrix is a summary of the classification's prediction outcomes that can help identify the types of error our model generates. Performance of prediction the thyroid disease was measured using several metrics such as Accuracy, MSE, MAE, RMSE, and time. All algorithms that we have applied in our thesis are compared with the following metrics: 1. Accuracy: It defines the similarity between the predicted and actual values. For measuring accuracy, the confusion matrix is utilized to determine the precise accuracy in each class based on result distributions.
2. Mean Square Error (MSE): This is a non-negative metric for determining prediction quality. If value is zero, it is perfect. In reality, however, this is impossible, the values that are closer to zero are considered to be more superior.
3. Root Mean Square Error (RMSE): It is the standard deviation of prediction errors. The distance from the regression line to the anticipated point is the prediction error.
4. Time: The amount of time in seconds it takes to achieve the desired level of accuracy for the given dataset.
We have applied confusion matrix along classifier models like logistic regression, decision tree, support vector machine (linear kernel), support vector machine (RBF Kernel), and Random Forest.
In our thesis paper, we also developed a critical discussion concerning limits, open challenges, some future scopes which is related to this thesis and tried to give a clear knowledge about the thyroid disease based on our investigation and its detection techniques using machine learning. Our analysis of the different methods proposed has been also provided to draw some conclusions.