AI Image Generator, developed by OpenAI, can create detailed images from a few words

Artificial intelligence has often pitted people against each other in creative battles. He can beat grandmasters in chess, create symphony, create poetry and now create detailed works of art from a short written prompt. The OpenAI team has recently developed powerful software capable of creating a wide image in seconds, just from a string of words given to it. This program is known as Dal-E2 and is designed to revolutionize the way we use AI with images.

What does Dal-e2 do?

In 2021, the AI ​​research development organization created a program called OpenAI Dall-E – a combination of Salvador Dall and Wall-E. This software was able to take a typed prompt and create a completely unique AI-generated image. For example, * a fox in a tree * will bring a picture of a fox sitting in a tree, or the prompt * astronaut with a bagel in hand * will look good, you can see where it goes.

Although certainly impressive, the images were often vague, not entirely sharp and took some time to create. Now OpenAI has made huge improvements to the software, creating Dall-E 2 – a powerful new iteration that works at a much higher level.

In addition to some other new features, the main differences with this second model are a huge improvement in image resolution, less delay (the time it takes to create images) and a smart algorithm for creating images.

The software not only creates a picture in a unique style, you can add different artistic techniques according to your request, like drawing style, oil painting, plasticine model, wool weaving, painting on a cave wall or even 1960s movie poster.

Dal-e is a very useful assistant that enhances what a person can usually do, but it really depends on the creativity of the person using it. An artist or someone more creative can create something really interesting, “said Aditya Ramesh, one of the chief engineers at Dal-E2.

A jack-of-all-trades

In addition to the ability to create images only at text prompts, Dall-E 2 has two other clever techniques: impainting (name given to the technique of reconstructing damaged images or ‘filling in the missing part of an image’) and variation. These two apps work the same way as the rest of Dal-e, with just one touch.

With inpainting, you can take an existing image and add new elements to it or change some part of it. If you have a picture of the living room, you can add a new carpet, a dog to the sofa, change the picture on the wall, or even put an elephant in the room because someone might think about it, you see.


Left, original image, Dal-E2 version on the right. Numbers 1, 2 and 3 give an example of an image of the position of an element (in this case flamingo). You can change its position

Variation is another service that requires an existing image Insert a photo, illustration or any other type of image and Dal-e’s Variation tool will create hundreds of its own versions. You can give her a picture of a tltubbies, and she will make a similar version and copy it. An old painting of a samurai will create a similar image, you can also take pictures of some graffiti that you see and get similar results.

You can use this tool to combine two images. Make a pot by mixing a dragon and a corgi, or a rainbow and a pot.


The original image is left, right, the Dal-E2 variation tool applied to this image

Based on the DALL-E 2 CLIP, a computer vision system that OpenAI announced last year. DALL-E1 simply took our GPT-3 approach to language and applied it to create an image: we narrowed the images down to a few words and we just learned to predict what was going to happen, “said Prafulla Dhariwal, a researcher at OpenAI. In reference to the GPT model used by many textual AI applications.

But word matching did not necessarily capture the most important qualities for humans, and the predictive process limited the reality of the image. CLIP was designed to look at images and minimize their content like humans OpenAI has repeated this process to create “unCLIP”, a reverse version that starts with a description and goes into the image. DALL-E 2 creates images using a process called “diffusion”, starting with the “dot bag” described by Dhariwal and then filling in an increasingly detailed pattern.

The software can help people edit their photos, create artwork or create countless stock images. DALL-E 2 is a research project that we do not currently have in our API, says OpenAI. As part of our efforts to develop and deploy AI responsibly, we are exploring the limitations and capabilities of DALL-E with a select group of users.

The limit of dal-e2

While there is no doubt about how great this technology is, it is not without its limitations.

The problem you face is confusion of specific words or expressions. For example, the researchers noted that when they caught a black hole inside a box, Dal-E2 returned a black hole inside a box, instead of the cosmic body they were looking for.

This can often happen when a word has more than one meaning, the sentences are misunderstood or the conversation is used. This is to be expected from an artificial intelligence that takes the literal meaning of your words.

Another thing to get used to with the system is how the prompt and art style works. When you type something, the initial image may not be accurate and although it technically matches your request, it does not completely match your feelings or ideas. It may take some time to get used to and some minor tweaking may be required, Ramesh says.

Another area where pulses can be confused is the variable mixture. If you ask the model to draw a red cube on top of a blue cube, it sometimes gets confused and does the opposite. We can solve this problem very easily in future repetitions of the system, I think, “explained Ramesh.

Fighting stereotypes and human contributions

Like all good things on the Internet, it doesn’t take long for a major problem to arise *: How can this technology be used unethically *? And not to mention the additional problems of AI history of learning some rude behavior from internet users.

When it comes to the technology surrounding AI image creation, it seems obvious that it can be manipulated in a variety of ways:

To work around this, the OpenAI team behind Dal-E has implemented a security policy for all images on platforms that operate in three stages. The first step is to filter the data with a big violation. It contains violence, sexual content, and images that the group may deem inappropriate.

The second stage is a filter that looks for more subtle dots that are harder to detect. It could be political content or propaganda in one way or another. Finally, in its current form, every image produced by Dal-e is reviewed by a human being, but this is not an effective long-term measure as the product grows.

Despite using this policy, the team is clearly aware of the future of this product. They listed the risks and limitations of Dal-e, detailing how many problems they might face.

It covers a lot of issues. For example, images often show superstitions or stereotypes such as the term marriage primarily referring to Western marriages. Where a lawyer’s search found that most older white men, nurses do the same thing with women.

These are not new issues at all and are something that Google has been tackling for years. Often, the image generation can follow the superstitions observed in society.

There are also ways to persuade Dal-e to create content that wants to filter the word. Since blood will trigger the violence filter, a user may type “ketchup pool” or something similar in an attempt to bypass it.

In addition to the team’s security policy, they have a clear content policy that users must adhere to.

The future of Dal-e

So the technology is there and is definitely working well, but what’s next for the Dal-E2 team? Currently, the software is being rolled out to a waiting list with no plans to open to the public.

By gradually releasing its products, the OpenAI Group can monitor its growth, develop its security mechanisms, and prepare its products for the millions of people who will soon be charging their orders.

We want to get this research into the hands of the people, but at the moment we are interested in getting feedback on how people are using the platform. We are certainly interested in expanding this technology, but we do not currently have any commercialization plans, ”said Ramesh.

During this time, will you be able to find pictures that were made by machines and that were made by humans? If so, take the test below and share your score in the comments. What is your strategy to differentiate?

This image does not exist

Source: Open AI

And you?

What do you think of Dal-e2?
Do you see potential abuse? Which one?
What do you think of the OpenAI researchers’ approach to reducing potential abuse, which includes:

  • Limit DALLE 2’s ability to create violent, hateful or adult images;
  • Removing the most explicit content from the training data and reducing DALLE’s exposure to these concepts;
  • Uses advanced techniques to prevent the photorealistic generation of real human faces, including public figures.

See also:

OpenAI’s DALL-E AI image generator can now edit photos, researchers can sign up to test it
Open AI introduces DALL-E (similar to GPT-3), a model that generates images from text, for a wide range of concepts expressed in natural language.

Leave a Comment