A Web Crawling and NLP-Powered Model for Filtering Inappropriate Content for Primary School Learners' Online Research

A. Chiwanza; F.D Mukoko; B. Mupini1

1

Publication Date: 2024/11/13

Abstract: Most of learning today has gone digital as online methods are being utilized. A lot of activities happen on the internet, people post various material which are both good and bad. However, students whilst studying end up accessing those bad websites like porn sites and other inappropriate content. Ensuring a safe online environment for students is very vital in order for them not to be disturbed or to end up being victims on the internet. Parents and teachers have tried to monitor activities of their children but end up being tricked. Some researchers suggest blocking the unsafe sites which students end up by-passing. This research proposed use of a web crawling and Natural Language Processing powered model for filtering inappropriate content for primary school online learners. The results obtained indicated that inappropriate content was blacklisted, filtered successfully and could not be accessed by students. Therefore, the model was developed correctly and met the intended research goal.

Keywords: Online Learning; Website Content; Internet; Blacklisted; Filtering; Web Crawling.

DOI: https://doi.org/10.38124/ijisrt/IJISRT24JUL1083

PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT24JUL1083.pdf

REFERENCES

  1. M. Ridhwan et al., “Collaborative filtering content for parental control in mobile application chatting,” vol. 8, no. 4, pp. 1517–1524, 2019, doi: 10.11591/eei.v8i4.1634.
  2. A. S. Luccioni and J. D. Viviano, “What ’ s in the Box ? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus,” pp. 182–189, 2021.
  3. H. Naveed et al., “A Comprehensive Overview of Large Language Models,” 2024.
  4. R. Elgedawy, J. Sadik, C. Childress, C. Shubert, and S. Ruoti, Security Advice for Parents and Children About Content Filtering and Circumvention as Found on YouTube and TikTok, vol. 1, no. 1. Association for Computing Machinery.
  5. “Kylie L. Anglin (2019) Gather-Narrow-Extract: A Framework for Studying Local Policy Variation Using Web-Scraping and Natural Language Processing,” pp. 685–706, 2019, doi: 10.1080/19345747.2019.1654576.
  6. V. Jacob and R. Chandrasekaran, “FILTERING OBJECTIONABLE INTERNET CONTENT,” no. May, 2014, doi: 10.1145/352925.352950.
  7. N. Gupta and S. Hilal, “Algorithm to Filter & Redirect the Web Content for Kids ’,” no. February 2013, 2016.
  8. A. Ruiz-iniesta, L. Melgar, A. Baldominos, and D. Quintana, “Improving Children ’ s Experience on a Mobile EdTech Platform through a Recommender System,” vol. 2018, 2018, doi: 10.1155/2018/1374017.
  9. S. Merayo-alba and E. Fidalgo, “Use of Natural Language Processing to Identify Inappropriate Content in Text,” no. August, 2019, doi: 10.1007/978-3-030-29859-3.
  10. F. Martin, J. Bacak, D. Polly, W. Wang, L. Ahlgrim, and F. Martin, “Teacher and School Concerns and Actions on Elementary School Children Digital Safety,” TechTrends, vol. 67, no. 3, pp. 561–571, 2023, doi: 10.1007/s11528-022-00803-z.
  11. M. Aljabri, R. Zagrouba, A. Shaahid, F. Alnasser, A. Saleh, and D. M. Alomari, Machine learning ‑ based social media bot detection : a comprehensive literature review, vol. 13, no. 1. Springer Vienna, 2023. doi: 10.1007/s13278-022-01020-5.
  12. Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, “High-Confidence Computing A survey on large language model ( LLM ) security and privacy : The Good , The Bad , and The Ugly,” High-Confidence Comput., vol. 4, no. 2, p. 100211, 2024, doi: 10.1016/j.hcc.2024.100211.
  13. H. Laurençon et al., “OBELICS : An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents,” no. NeurIPS, pp. 1–20, 2023.
  14. T. B. Brown et al., “Language Models are Few-Shot Learners,” 2020.
  15. E. Mahmoud and M. Taha, “Filtering of Inappropriate Video Content A Survey,” no. January, 2022, doi: 10.17577/IJERTV11IS020130.