Publication Date: 2023/05/27
Abstract: - Exploratory data analysis (EDA), which provides both descriptive and inferential analysis, plays a crucial role in comprehending the significance of the data's hidden information. The text corpus's subjects are identified using the data mining method. The datasets from Yelp, which contain information about businesses, users, ratings, and signups, have been analyzed in this study. In addition to timing of check-ins at company sites, our study also looks at firm performance, regional distribution, reviewer ratings, and other factors. We discovered that Yelp check-ins, tips, and elite users had all declined over time. Additionally, our analysis showed that Canadians have more reliable star ratings and sentiment ratings than Americans. To improve on this effort, we suggest a new project that comprises gathering a dataset, cleaning the data by removing null values, applying a machine learning algorithm with Ada Boosting, and forecasting the accuracy score with MLP. The proposed technique for EDA and data mining on Yelp restaurant reviews has various potential flaws. Because the information was selected depending on the needs of the research, it may not be representative of all restaurants on Yelp. This might lead to skewed findings. Pre-processing processes such as data cleaning and sampling may remove vital information or inject noise into the dataset. The model's performance and generalizability may not be adequately assessed using hold-out and cross-validation procedures.
Keywords: Exploratory Data Analysis (EDA), Descriptive Analysis, Inferential Analysis, Data Mining, Yelp, Datasets, User Information, Ratings , Performance, Regional Distribution, Star Ratings, Sentiment Ratings, Machine Learning Algorithm, Ada Boosting, MLP, Accuracy Score, Data Cleaning
DOI: https://doi.org/10.5281/zenodo.7976330
PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT23MAY672.pdf
REFERENCES