Publication Date: 2023/05/12
Abstract: In this study, latent diffusion is proposed as a novel method for text-to-image synthesis. The difficult task of text-to-image synthesis entails creating accurate visuals from textual descriptions. The suggested method relies on a generative adversarial network (GAN) that has a stability criteria to enhance the stability and the convergence of the training process. The Lipschitz constant and Jacobian norm, which gauge the smoothness and robustness of the generator network, serve as the foundation for the stability criterion. The outcomes demonstrate that the suggested method beats existing cutting-edge techniques in terms of image quality and stability. The suggested method may find use in a number of fields, including computer vision, image editing, and artistic creativity. The work proposes a potential method for text-to-image synthesis and emphasises the significance of stability in GAN training. The findings of this study add to the expanding body of work on text-to-image synthesis and offer suggestions for further study in this area.
Keywords: CNN, RNN, GANs, VAEs, GDM, LDM, MIDAS
DOI: https://doi.org/10.5281/zenodo.7927460
PDF: https://ijirst.demo4.arinfotech.co/assets/upload/files/IJISRT23APR1877_(1).pdf
REFERENCES