Species Name Identification for Essential Oils from Biomedical Abstracts Using Text Mining and Natural Language Processing

Chou-Cheng Chen1

1

Publication Date: 2024/05/13

Abstract: In recent years, the publication of scientific papers related to essential oils has achieved exponential growth due to the popularity of aromatherapy, although no studies using natural language processing and text mining methods to extract information from scientific articles related to essential oils are currently found. Accordingly, this study is the first to use natural language processing and text mining methods to identify species names appearing in abstracts related to essential oils. We obtained 34,637 abstracts using keywords, “essential oil” to quarry PubMed on 2024/03/15. The 1,081,005 species names of plants and fungi were obtained from Taxonomy FTP on the same day. The nouns from titles of articles related to essential oils were obtained via identification of parts-of-speech and from titles and abstracts extracted within italicized labels. These nouns were used to identify 10,445 plant and fungal species names downloaded from FTP appearing in abstracts related to essential oils with these identification terms being used to detect whether abstracts related to essential oils revealed the species names. 156,371 records contained links between PMID and Taxonomy ID. To the best of our knowledge, our study shows this method can efficiently identify the names of species from abstracts related to essential oil.

Keywords: Text Mining; POS; Essential Oil; Species Name.

DOI: https://doi.org/10.38124/ijisrt/IJISRT24APR2134

PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT24APR2134.pdf

REFERENCES

  1. B. Cooke and E. Ernst, “Aromatherapy: a systematic review,” Br J Gen Pract, 50(455): pp. 493-498, 2000.
  2. E.W. Sayers, et al., “Database resources of the national center for biotechnology information,” Nucleic Acids Res, 50(D1): pp. D20-D26, 2022.
  3. D. Bi, Ju-E Guo, E. Zhao, S. Sun and S. Wang, “Identifying environmental and health threats in unconventional oil and gas violations: evidence from Pennsylvania compliance reports,” Environ Sci Pollut Res Int, 29(15): pp. 22742-22755, 2022.
  4. K. Domingues, N.H. Franco, I. Rodrigues, G. Stilwel and M.M.-S. Ana, “Bibliometric trend analysis of non-conventional (alternative) therapies in veterinary research,” Vet Q, 42(1): pp. 192-198, 2022.
  5. Dos Santos, N.S.S., et al., “Biotransformation of 1-nitro-2-phenylethane [Formula: see text] 2-phenylethanol from fungi species of the Amazon biome: an experimental and theoretical analysis,” J Mol Model, 29(8): pp. 223, 2023.
  6. Sayers E., “The E-utilities In-Depth: Parameters, Syntax and More, ” 2009 2022/11/30 [cited 2024 04/18]; Available from: https://www.ncbi.nlm.nih.gov/books/NBK25499/.
  7. Schoch C.L., et al., “NCBI Taxonomy: a comprehensive update on curation, resources and tools, Database (Oxford), 2020, 2020.
  8. “The 9 E-utilities and Associated Parameters,” [cited 2024 4/18]; Available from: https://www.nlm.nih.gov/dataguide/eutilities/utilities.html.
  9. Manning C., Surdeanu M., Bauer J., Finkel J., Bethard S., and McClosky, D., “The Stanford CoreNLP natural language processing toolkit,” in Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, 2014.
  10. Steinmann A., Schätzle M., Agathos M. and Breit R., “Allergic contact dermatitis from black cumin (Nigella sativa) oil after topical use,” Contact Dermatitis, 36(5): pp. 268-276, 1997.