Long-form text-to-images generation, using a pipeline of deep generative models (GPT-3 and Stable Diffusion)