OpenAI takes on Midjourney with Dall-E 3 image generation enhanced by ChatGPT

As AI images become more lifelike, OpenAI claims it is working to curb misuse

Life

Image: Getty

21 September 2023

OpenAI has announced an updated version of its AI image generation model Dall-E, which aims to compete with some of the most detailed image models available, and is included within ChatGPT.

With Dall-E 3, OpenAI has promised its most capable image generation model yet, with more detailed and photorealistic image outputs compared to its predecessor Dall-E 2.

The improvements put the model head to head with Midjourney 5 or Adobe’s image generation model for enterprise, Firefly.

The model will become available directly within ChatGPT, with the chatbot able to generate detailed prompts for the image model in order to more accurately produce the results that users seek.

Generative AI text-to-image models generally produce more impressive results when fed long inputs of text descriptors, produced through a careful trial-and-error process called prompt engineering.

The ability to generate these strings directly within ChatGPT could empower less-skilled users to generate complex images easily, help businesses keep images within their style guides, and prevent the model from producing confusing or unwanted content.

“Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering,” wrote OpenAI.

“Dall-E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide.”

The combination of the well-known ChatGPT interface and Dall-E 3 could also make the model easier to use without user training. OpenAI hopes that this, combined with the new more powerful model, could give Dall-E 3 an edge over competitors including Bing Image Creator.

It will be made available to ChatGPT Plus and Enterprise customers in October, and for ChatGPT API users later in the fall.

The first generation Dall-E model was released in January 2021, and was capable of limited image generation that could be readily identified as computer-generated.

In the nearly three years since, the quality of AI images has dramatically improved and tools for generating AI images have become easier to access and more widely used by businesses.

At Appian World 2023, the firm used Midjourney for every image in its opening keynote. Even with the strides that made this possible, Malcolm Ross, director and SVP of product strategy, told ITPro that the final images seen by the audience had come from a lengthy back-and-forth of prompt engineering – the kind of step OpenAI is attempting to eliminate.

Advancements have not been without controversy. Many artists are concerned about the impact of AI on their work, particularly those with a trademark style that can be replicated by AI models upon user request.

To combat this, OpenAI stated that Dall-E 3 will reject prompts that specifically ask for an image in the style of an artist who is currently alive, and the firm provided an opt-out form for artists who don’t want their work to be used in future image generation models.

It is not clear if redress methods for those whose art has already been used to train models such as Dall-E 3 will be implemented in the future. AI developers can struggle to negate the influence of training data from a model, even in cases where specific model weights can be identified.

DALL-E 3 output (left) and an Adobe Firefly output (right) for the prompt A modern architectural building with large glass windows, situated on a cliff overlooking a serene ocean at sunset”. (Image: OpenAI/Future)

In theory, this could lead to workarounds in which users can effectively generate images in the style of any artists already used in the training data using curated terms that lead to the craft.

The EU’s AI Act contains provisions that would compel developers to reveal copyrighted content they used to train generative AI models, and OpenAI itself is currently being sued by a number of authors who allege that ChatGPT was trained on their works without consent.

AI developers will be subject to a complex legal regime in the coming months and years, and text models such as Dall-E 3 will be subjected to scrutiny

OpenAI said that it is working on a “provenance classifier” so that images made with Dall-E 3 can be easily identified as artificial, and had put in place mitigations to prevent users generating images of public figures.

Developers across the industry are working tools that can accurately detect AI content such as text, images, audio, and video out of fears content could otherwise be misused.

Mandiant researchers recently warned that generative AI could fuel a new wave of malicious information campaigns, and AI images of public figures such as political leaders have already been used for small-scale disinformation campaigns on social media.

OpenAI takes on Midjourney with Dall-E 3 image generation enhanced by ChatGPT

Sign up for the Technology Minute

Support our advertisers

Listen to Tech Radio

Most Popular