I’m sure you remember the universally favorite science fiction movie The Matrix (1999), where the reality we live in is a programmatically generated world. Well, yesterday’s science fiction is today’s reality. Just like the programmer from The Matrix could conjure up a taekwondo dojo or a busy street in New York, by typing in a few words of code, you too, can conjure up just about anything these days by combining the power of your imagination, language and Artificial Intelligence. But exactly at which point did such science fiction become a reality?
Introducing Aditya Ramesh, a homegrown tech wizard, founder of DALL·E and co-creator of DALL·E2. The name DALL·E is inspired by the Spanish surrealist artist Salvador Dali and the famous Disney robot from the American sci-fi movie, WALL-E. The name evokes the universal merging of art and technology. Ramesh introduced DALL·E to the world in January 2021, in collaboration with AI research and deployment company OpenAI. The technology utilizes deep learning models in conjunction with the GPT-3 large language model as its foundation to comprehend user prompts expressed in natural language and produce novel images.
Dall-E represents a progression from a notion initially introduced by OpenAI in June 2020, originally referred to as Image GPT. This early endeavour aimed to showcase the potential of a neural network in generating high-quality images. With the development of Dall-E, OpenAI expanded upon the foundational concept of Image GPT, allowing users to generate fresh images based on textual prompts, much like how GPT-3 generates new text in response to natural language inputs.
Dall-E 2 was released in April 2022, representing an advancement from the original Dall-E. With the original Dall-E, OpenAI employed a dVAE (deep Variational Autoencoder) for image generation. However, Dall-E 2 utilizes CLIP, a diffusion model which is capable of generating images of even higher quality. The diffusion models were a game-changer for DALL-E 2 along with its open-source counterparts, Stable Diffusion and Midjourney. According to OpenAI, Dall-E 2 images have four times the resolution of those created with Dall-E. Moreover, Dall-E 2 exhibits notable improvements in terms of speed and image size capacity compared to its predecessor, enabling users to generate larger images at a faster pace.
CLIP consists of two neural networks: a text encoder and an image encoder. Through training on a vast collection of image-text pairs, these encoders map inputs to embeddings in a shared "concept space." During training, CLIP receives image-caption pairs and forms matching pairs (image with corresponding caption) and mismatching pairs (image with any other caption). The objective is to train the encoders to map matching pairs close together and mismatching pairs far apart. This contrastive training encourages CLIP to learn various image features, such as objects, aesthetics, colors, and materials. However, CLIP may struggle to differentiate between images with swapped object positions, as it focuses on matching captions rather than preserving positional information.
DALL-E 2 also has additional capabilities unlike its predecessor:
Inpainting: It performs edits to an image using language.
Variations: It generates new images that share the same essence as a given reference image, but differs in how the details are put together.
Text diffs: It transforms any aspect of an image using language.
Aditya Ramesh, in an interview with Venture Beat
The foundational idea of DALL·E is to help artists. Just the way that Codex is a constant companion for programmers, Ramesh describes DALL·E as a “creative-co-pilot” for artists. Like it or not, this is the future of creative work. While Ramesh’s research and execution have undoubtedly been ground-breaking in the field of scientific advancement, larger questions still remain at large: Will all existing artists be able to adapt or fuse AI into their creative practice? Will traditional or purist art practices decline alongside the rise of AI art? While AI certainly does make art more accessible, will it devalue the “skill” required to draw or paint?
Find out more about DALL·E here.
If you enjoyed reading this, here's more from Homegrown: