AI-Generated Media

←Return to projects

Overview

As part of my AI-Unity integration project, I’ve explored running an AI image generator local to my computer, with the intention of eventually using player input to generate in-game images.
This series of images presents the results of my experiments with Stable Diffusion, including prompt structuring and generation settings, tackling a wide variety of categorized subject matter.
My goal, upon completion of this sub-project, is to understand how best to format image prompt data to be used in an interactive environment.

Process

Similar to the AI-Unity project, I prefer a free and open source model that can be run locally to my own computer. For this, I setup a Stable Diffusion build called AUTOMATIC1111, which includes a browser-based UI well regarded for its user friendliness. Then lastly, I acquired a model from CivitAI called Realistic Vision, trained to create images with life-like photorealism. With the model and Stable Diffusion build, launching the software brings me to the following interface:

UI 1

Stable Diffusion Web UI

I then typically get setup with a list of common prompts and negative prompts to improve the quality of the generation. This includes keywords like “8k resolution, masterpiece, highly detailed”, or negative keywords for things I don’t want like “bad anatomy, low quality, blurry”. Both of these only help guide the generation results and often need a bit of tuning based on the subject matter.

The challenge with AI image generation is putting to words the results you want, in a way the AI will correctly interpret, while being unsure yourself as to what will happen when you press generate. I often start with a simple one sentence prompt (e.g. “futuristic spaceship in colourful galaxy”) and then refine with further detail as I see the results and begin to understand what I’m hoping to see. At that point, settings also begin to come into consideration, often increasing or decreasing the number of steps, the weighting of different keywords, and the specific seed used to generate the image. On average, I find it takes me 10-20 generations to reach an outcome that I like.

This poses an interesting dilemma for AI image generation use in interactive media, as the viewer is unlikely to be satisfied in a single generation. Bulk image generation might allow users to identify their preferred outputs, while also increasingly the generation time considerably. On-demand image generation appears to be the more balanced approach, letting the user decide if they’d like to re-generate the image. On my current settings, image generation can be completed within roughly 30 seconds. Lower resolution images for interactive projects could likely be generated even faster.

In the following section(s) I’ll display some of my results, including the seed and settings used to make them (feel free to recreate them if you’d like).

Concept Art

A series of attempts to convey ideas for media works. Mostly focused on fictional environments, exploring different genres and aesthetics.

Concept Art 1: Sci-Fi

Concept Art 1

Prompt: “Futuristic silver spaceship cockpit interior, flashing control panel, central viewport revealing a vast colourful spiral galaxy.”

View full prompt
Notes: 20 attempts before settling on this version. Be sure to set steering wheel as a negative prompt or you'll get one in every generation.

Prompt: futuristic silver spaceship cockpit interior, flashing control panel, central viewport revealing a vast colourful spiral galaxy, front view, highly detailed, intricate, atmospheric lighting, ultra-realistic textures, cinematic composition, concept art, 8k resolution, masterpiece

Negative Prompt: blurry, lowres, low quality, deformed, distorted, extra limbs, bad anatomy, grainy, out of frame, cropped, watermarks, text, signature, jpeg artifacts, oversaturated, underexposed, steering wheel

Generation Settings:
Steps: 20
Sampler: DPM++ 2M
Schedule type: Karras
CFG scale: 10
Seed: 2053538866
Size: 576x384
Model hash: 15012c538f
Model: realisticVisionV60B1_v51VAE
Denoising strength: 0.7
Hires upscale: 2
Hires steps: 10
Hires upscaler: Latent
Version: v1.10.

Concept Art 2: Environmental/Botanical

Concept Art 2

Prompt: “Colossal monstrous venus flytrap inspired by Little Shop of Horrors”

View full prompt
Notes: Nearly 30 attempts before settling on this one, the AI didn't get what I was going for until I gave it something more specific (the musical) to reference

Prompt: colossal monstrous venus flytrap inspired by Little Shop of Horrors, gaping crimson maw lined with razor-sharp cilia, surrounded by jungle trees, partially obscured by foliage, perspective from tiny human’s viewpoint for scale, cinematic lighting with dramatic mist, ultra-realistic botanical detail, predatory stance, 8k, concept art, masterpiece

Negative Prompt: blurry, lowres, low quality, deformed, distorted, extra limbs, bad anatomy, grainy, out of frame, cropped, watermarks, text, signature, jpeg artifacts, oversaturated, underexposed

Generation Settings:
Steps: 20
Sampler: DPM++ 2M
Schedule type: Karras
CFG scale: 7
Seed: 1896508403
Size: 576x384
Model hash: 15012c538f
Model: realisticVisionV60B1_v51VAE
Denoising strength: 0.7
Hires upscale: 2
Hires steps: 10
Hires upscaler: Latent
Version: v1.10.

Concept Art 3: Gothic Horror

Concept Art 3

Prompt: “A lone necromancer in flowing black robes with a staff, raising the dead among broken tombstones in a moonlit graveyard”

View full prompt
Notes: 15th iteration of the prompt, started too specific and removed details for prompt simplicity. Balance between simple and detailed prompts seems important

Prompt: overhead view of a lone necromancer in flowing black robes with a staff, raising the dead among broken tombstones in a moonlit graveyard, eerie green fog under moonlight, dramatic shadows, ultra-detailed, atmospheric, dark fantasy, concept art, cinematic lighting, 8k resolution

Negative Prompt: blurry, lowres, low quality, deformed, distorted, extra limbs, bad anatomy, grainy, out of frame, cropped, watermarks, text, signature, jpeg artifacts, oversaturated, underexposed

Generation Settings:
Steps: 20
Sampler: DPM++ 2M
Schedule type: Karras
CFG scale: 7
Seed: 3079054624
Size: 576x384
Model hash: 15012c538f
Model: realisticVisionV60B1_v51VAE
Denoising strength: 0.7
Hires upscale: 2
Hires steps: 10
Hires upscaler: Latent
Version: v1.10.

Concept Art 4: Steampunk

Concept Art 4

Prompt: “Steampunk inventor workshop table, clockwork companion robot”

View full prompt
Notes: 5th iteration of the prompt. It wasn't exactly what I was expecting, but I found the design oddly charming.

Prompt: steampunk inventor workshop table, clockwork companion robot, small and cute, ornate bronze clockwork detailing, concept art, 8k resolution

Negative Prompt: blurry, lowres, low quality, deformed, distorted, extra limbs, bad anatomy, grainy, out of frame, cropped, watermarks, text, signature, jpeg artifacts, oversaturated, underexposed

Generation Settings:
Steps: 20
Sampler: DPM++ 2M
Schedule type: Karras
CFG scale: 7
Seed: 3088515835
Size: 576x384
Model hash: 15012c538f
Model: realisticVisionV60B1_v51VAE
Denoising strength: 0.7
Hires upscale: 2
Hires steps: 10
Hires upscaler: Latent
Version: v1.10.

Concept Art 5: Fantasy

Concept Art 5

Prompt: “Bioluminescent squid-headed humanoid warrior underwater”

View full prompt
Notes: 20th generation to refine the prompt, had to be specific about no extra fingers or floating limbs

Prompt: Bioluminescent squid-headed humanoid warrior underwater, purple skin with tentacles, coral sword in hand, deep sea, dramatic overhead lighting, detailed water and tentacle textures, dynamic cinematic composition, concept art, 8k resolution, masterpiece

Negative Prompt: blurry, lowres, low quality, deformed, distorted, extra limbs, extra fingers, floating objects, bad anatomy, grainy, out of frame, cropped, watermarks, text, signature, jpeg artifacts, oversaturated, underexposed

Generation Settings:
Steps: 20
Sampler: DPM++ 2M
Schedule type: Karras
CFG scale: 7
Seed: 3703217172
Size: 576x384
Model hash: 15012c538f
Model: realisticVisionV60B1_v51VAE
Denoising strength: 0.7
Hires upscale: 2
Hires steps: 10
Hires upscaler: Latent
Version: v1.10.