Download

Text-to-Image Generation model using Generative Adversarial Networks to Enhance Location Services

Due to advances in machine learning algorithms, they can help translate descriptions into visual elements with the advent of generative adversarial networks. Image generation from na- ture has become one of the primary applications of modern conditional generative models. It is a flexible and intuitive way to create a conditional image with significant advances in recent years with regard to realism, visual, Diversity, and Semantic Alignment. However, the field still faces many challenges that require more research efforts such as enabling the generation of high-resolution images with multiple objects, and developing appropriate and reliable evalua- tion metrics that correlate with human judgment. A text-to-image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions. Due to the advancement in Machine Learning Algorithms, they can help in translating the descriptions to visuals. Generative Adversarial Networks (also known as GANs) can be used to create a set of images from the text which are a form of descriptions. Generative models algorithms come under unsupervised machine learning. Based on the recent advances in gen- erative adversarial networks (GANs), existing T2I models have made great progress. However, they have some limitations. The main target of this project is to address these limitations to enhance the text-to-image generation models to enhance location services. Text-to-image synthesis is proposed in this study using the Attentional Generative Ad- versarial Network (AttnGAN). In order to produce high-quality photos utilizing a multi-step approach, we build an attentional generating network called AttnGAN. The fine-grained image- text matching loss needed to train the AttnGAN’s generator is computed using a multimodal attentional multimodal similarity model that we provide. With an inception score of 4.81 on the PatternNet dataset, our AttnGAN model achieves an impressive R-precision value of 70.61 percent. Because the PatternNet dataset is entirely comprised of photographs, we’ve added verbal descriptions to each one to make it a text-based dataset instead. Many experiments have shown that the AttnGAN’s proposed attention procedures, which are critical for text-to-image production in complex circumstances, are effective.

Information

  • Students: Arwa Alsuhaibani - Reem Alnafisah - Yousra Alrashidi
  • Supervisor: Dr. Dina Mahmoud Ibrahim
  • Research Specialization: Classification methods
  • Upload Date: 30/05/2022