Image Captionbot for Assistive Technology

Arnold Abraham; Aby Alias; Vishnumaya1

1

Publication Date: 2022/03/09

Abstract: Because an image can have a variety of meanings in different languages, it's difficult to generate short descriptions of those meanings automatically. It's difficult to extract context from images and use it to construct sentences because they contain so many different types of information. It allows blind people to independently explore their surroundings. Deep learning, a new programming trend, can be used to create this type of system. This project will use VGG16, a top-notch CNN architecture for image classification and feature extraction. In the text description process, LSTM and an embedding layer will be used. These two networks will be combined to form an image caption generation network. After that, we'll train our model with data from the flickr8k dataset. The model's output is converted to audio for the benefit of those who are visually impaired

Keywords: Deep Learning; Recurrent neural network; Convolutional neural network; VGG16; LSTM.

DOI: https://doi.org/10.5281/zenodo.6341477

PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT22FEB655_(1).pdf

REFERENCES

No References Available