A Transformer on Tabular Data Comparative Analysis with Linear and Tree Base Machine Learning Algorithm on Diabetic Dataset

Kamin Gorettie Precody; Komiwe Faith Phiri; Dr. Ashish Kumar Chakraverti1

1

Publication Date: 2023/06/01

Abstract: - Lifestyle diseases have a rating of 80% as one of the top causes of death. About over 41 million lives are claimed just by lifestyle diseases, which are over 70% of all deaths around the world. In this same percentage about roughly 15 million deaths happen to people of the age range 30 to about 69 years. Lifestyle diseases are primarily originated due to the day-to-day habits of an individual. These habits that detract from activities and push people towards a sedentary routine can cause numerous health issues that may lead to harmful diseases that are nearly life-threatening. Furthermore, there are two common complex diseases that are heart disease and diabetes, researchers have discovered diabetes to be a silent but deadly disease, and many researchers use machine learning methods to help medical professionals for the diagnosing of lifestyle diseases. This paper reviewed the literature on predictions and diagnoses of lifestyle diseases with the use of transformers and machine learning techniques it is presented and used on Diabetics data of patients. Our research paper will highlight the importance of transformers and machine learning in analyzing huge datasets of patients to predict the whole kinds of diabetes and how they can be treated and how they can be prevented. Further, we have utilized Transformers on tabular data (Tabpfn), Random Forest, Decision Tree, Support Vector Machine K-Nearest Neighbors, Gradient Boosting, Histogram Gradient Boosting, and Adaptive Boosting for predicting how likely a person will have a bank account. The stratified holdout cross-validation method has been used to split the training dataset randomly into 90% train and 10% test sets. The result was collected and further compared with some existing approaches, which indicates that using transformers on tabular data (Tabpfn) outperforms the existing state-ofthe-art approach. The Tabpfn transformer on tabular data was optimal among adapted models based on F1- score, which are 98.46 %, 98.0694%, 91.736%, and 91.541% respectively.

Keywords: Transformer, Lifestyle Diseases, Machine Learning Techniques, Prediction.

DOI: https://doi.org/10.5281/zenodo.7994977

PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT23MAY1179.pdf

REFERENCES

No References Available