The time has finally come, and our AI overlords have arrived. Between creating wild new pieces of art, writing entire essays from simple prompts, or even beginning to generate their own code, generative AI’s have taken over the discussion of future technologies in recent times. For most, the AI takeover is viewed with either a futuristic wonder or a justified concern for what changes these technologies might bring to our very idea of work itself. There are few, however, who understand how to take the reins of these AI tools and make use of them to lighten their workload, create incredible works of imagination (both human and artificial), and so much more. To the uninitiated, here are some simple steps on how to get started with image generating AI tools:
- Firstly, you’ll want to figure out which image AI you’d like to use. There are paid options such as DALL-E 2 or Midjourney, which can be used for a time before paying for generation credits, or open-source programs such as Stable Diffusion, which requires a bit of set-up but are free to use. For the former options, joining the waitlists for access (DALL-E 2) (Midjourney) is the first step. If selected, you’ll gain access to the programs and a tutorial on how to operate them. For the latter, an easy-to-use Stable Diffusion interface is available online at the AUTOMATIC1111 github repository. The site contains information on how to install and get set-up.
- Once you’ve be selected for access or have installed the necessary programs, you’re ready to get started! All image AIs begin with a fundamental input: the prompt. Prompt engineering is a field unto itself, and there are many excellent guides available on the internet, but here are some easy pointers to get you started:
- Basic formatting: Terms in a prompt a divided by commas and can have different meanings if combined or separated.
- Ex.) ‘A cyberpunk portrait of a neon robot dog’ vs ‘A portrait of a dog, cyberpunk, neon color scheme’
- Base: Defining what medium your image will attempt to imitate.
- Ex.) ‘A Portrait of . . .’, ‘A 3D-rendering of . . .’
- Descriptors: A few guiding words that will guide the style of the piece.
- Ex.) ‘Cyberpunk’, ‘Brutalist’
- Things: The nouns, or main objects of focus
- Ex.) ‘A skyscraper’, ‘Two dogs’
- Environment: The physical surroundings or lighting conditions
- Ex.) ‘. . . in a vast open field’, ‘. . . at dusk’
- Influences: The name of a style or artist that the piece can imitate
- Ex.) ‘. . . by Rembrandt’, ‘. . . in the style of Picasso’
- Additional prompts: Some keywords can aide in improving the quality of generations, without describing the piece itself
- Ex.) ‘artstation’ draws on the vast library of artwork available on that platform to add coherency, ‘in focus’ or ‘centered’ can help draw the focus of the image to your subject, and ‘highly detailed’ will reduce blurry areas.
- Negative Prompts: Some programs such as Stable Diffusion allow for the input of negative prompts as well to help guide your generation, such as inputting ‘blurry’ to help increase the fidelity of the output, but this effect can also be achieved by using a negative sign in the positive prompt, such as ‘-blurry.’
- Emphasis: Keywords in the prompt can be emphasized to add more focus to them in the generation by use of () or :X to increase the emphasis or [] and :-X to reduce it
- Ex.) ‘A (cubist) portrait of a dog, detailed:2, blurry:-1, [cat]’
- Ex.) ‘A (cubist) portrait of a dog, detailed:2, blurry:-1, [cat]’
- Next, adjust the output resolution (if available) to your desired size. Some image AIs can stretch or duplicate the subject of your prompt if the image size is set to an extreme, so it’s a good idea to adjust your prompt if the resolution is extreme, such as including ‘a panorama of a mountain range’ for a very wide image instead of just ‘a picture of a mountain.’
- Decide on the number of images you want to generate at once. When starting a new project, it can be a good idea to set your batch number lower so that you can alter and play with your prompt and settings, before increasing it to get a range of good final images to choose from.
- Change the sampler / model. Some image AIs allow for the download of different models from sites such as AI which are trained and specialized to create different styles or subjects. The sampler on the other hand is the algorithm with which the image AI creates the output. Samplers can vary in levels of detail or prompt recognition, though the most advanced samplers are usually selected by default.
- Adjust your sampling steps based on the desired level of detail. The higher the sampling steps, the more the image AI will go over the base noise that it starts with to add more detail to the image. A good standard range to work in is 20-50, though increasing this number will also increase the time to generate a result. Going too high or too lower will produce images that are overdetailed or underdetailed respectively.
- Finally, alter the CFG scale, if available. This is a measure of how closely the image AI will stick to your prompt. Reduce the number and you’ll get more creative (and potentially nonsensical) results. Increase it and you might get exactly what you asked for, but sometimes the result is a bit too literal. An average number of 7-8 should allow for a good blend of prompting and AI creativity.
- Hit generate and check out your result! Few attempts create the perfect image on the first or even tenth try, as a fair amount of effort is needed to refine your prompt, change the settings, and maybe touch up the final result in a drawing software. This process can often be enlightening however, as you discover new and interesting interpretations of what you set your prompt for, potentially creating something even beyond your imaginations.
- Basic formatting: Terms in a prompt a divided by commas and can have different meanings if combined or separated.
This is by no means all that image AIs can do, many have the ability to inpaint and outpaint, allowing the user to ‘expand’ an existing image to paint beyond its boundaries, or erase sections of an image and generate entirely new details, such as erasing an outfit and generating a new one. Another powerful function is img2img, which allows the use of an existing image as a base onto which to generate a new one. This function includes a ‘denoising strength’ slider, which affects how much the original image will be altered. For example, using a base image of a landscape and prompting ‘cartoon style’ can make the resulting image more or less comical depending on the denoising value, from completely identical to radically different. Finally, image upscalers can be used to increase the size of your outputs, especially useful as generation a high resolution image can take significant time and processing power. Upscalers use AI to ‘fill in’ an image as it gets larger so that no quality is lost, or pixels are stretched.
New AI tools are being developed at a rapid pace, and many of the tips in this post may soon become out-of-date. The world of AI is wonderful and rapidly evolving, but it can also be very fun, and allow works of creativity to be created by all. Give it a try and see what you can create!
Design Interface Inc. can show you what is possible. Our forward-thinking solutions for product design, package design, medical device design, graphic design and photography unlock the value of your ideas as we communicate your message and goals. See more here: https://designinterface.com/