SpongeBob Balenciaga and a Review of Adobe Firefly (Beta)
The latest AI image generation application, some thoughts about future AI game generation, and the dawn of SpongeBob by Balenciaga
After the onslaught of AI driven engines in early 2023 I was instantly intrigued with the generative capabilities of image generation. While ChatGPT is a monumental milestone in AI progress it does not carry the same spectacle as algorithmically generated images.
Many of us have seen the incredible art built with a few simple prompts. Works that could be the highlight of an artist's portfolio yet were brought into existence in mere seconds. Futuristic space scenes at the cutting edge of what we deem as reality. Fantastical portraits of medieval warriors with armor as detailed as something you would see in a museum.
Yet much of the foundation of how these images are generated results from a vast vacuuming of original artwork across the internet. The AI algorithms are not bringing these beautiful images out of the void of nothingness. It is mass plagiarization. The source is large swaths of artists, which is rehashed and called its own.
A specific example is with artist Hollie Mengert's work in late 2022. After someone trained Stable Diffusion to recreate her style it democratized all her work without permission. The initial defense of the method was that the output from the AI trained model was new art and separate. This goes into a gray area of copyright law and morality.
This kind of morally questionable foundation kept me away from trying it out. I didn't want to contribute to a vast operation of repacking artists work. It is within this situation that Adobe Firefly announced its Beta program and I excitedly signed up.
Being a company based on creating tools for artists I anticipated that Adobe would approach the intellectual property issue with more care than others in the field. And true enough I was presented with the below notice after signing up:
We train Firefly using Adobe Stock and other diverse image datasets which have been carefully curated to mitigate against harmful or biased content while also respecting artist’s ownership and intellectual property rights.
And so off I went into the beta. As of early April 2023 there are only 2 services available within Firefly. An AI image generator (text to image) and text effects, which applies the stated texture to the defined letters. Other aspects of the service are still being developed:
Use a brush to add, remove, or replace objects in an image. Generate the new fill with a text prompt.
Generate images based on your own object or style.
Text to vector
Generate editable vectors from a detailed text description.
Change the aspect ratio of your image with a single click.
3D to image
Generate images from the interactive positioning of 3D elements.
Text to pattern
Generate seamlessly tiling patterns from a detailed text description.
Text to brush
Generate brushes for Photoshop and Fresco from a detailed text description.
Sketch to image
Turn simple drawings into full-color images.
Text to template
Generate editable templates from a detailed text description.
There is a wall of examples for both working services complete with the text prompts that generated them. This is incredibly useful for a noobie like myself to build the correctly written text parameters. The learning curve is present but nowhere near the challenge of becoming proficient in Photoshop or Premiere Pro.
My initial generated images were just a stretch of the imagination. To see what was actually possible with the impossible. My wife certainly loved it since she loves dinosaurs.
As with other AI image generation software suites it really struggles to accurately depict appendages. The same goes for the anatomy of a human's facial structure. While the AI model seems to keep it together with a few general prompts it falls apart once you get past a certain number of text specifications. Maybe it is trying too hard to qualify all text parameters yet failing to maintain when a human would deem as a cohesive face. Scaling back the number of text parameters resolves this issue and returns a bit of creative autonomy back to the AI model.
As a gamer, I first turned to Sci-Fi and Fantasy categories to really drive down into what was possible with a very specific prompt. While AI generation is currently going poorly for games it has the potential to became the future model of creating exactly what you want in the next decade. Yet, in these initial stages it would be useful to have some sort of dynamic character creation. This could promote a much higher re-playability factor. Guard rails could be put in place by the developers to guide the story in a specific trajectory. Or we could embrace the limitless and see what a real sandbox-like game dynamic adventure could take the audience.
The results could be incredible. Or horrific. Generative AI models restricted purely to building games or entertainment in general could have greater moral flexibility compared to real world AI instances. As long there was some sort of disclaimer at the beginning saying such. This work is a piece of fiction...blah blah blah. With entertainment as a get out of jail free card we could be in for a wild ride in future games. Popular titles such as Rimworld already have players notching off which Geneva Convention articles they have violated.
More traditional art is also possible. Some of the generated lithographs and line etchings appeared as if they were out of a very old historical book. A few of the prompts produced some odd ambience in the line etchings, which made it look like a middle school kid just learned Photoshop image effects. But this could be easily toned down with the correct curation of text parameters.
After sharing a few of these results with some family members I received the challenge to create SpongeBob Balenciaga. As of April of 2023, the current cultural meme is styling various narrative flavors into Balenciaga. While Harry Potter was first, Star Wars and Lord of the Rings closely followed. It seems to be a mad rush to Balencaiga-ify everything else before the cultural moment passes.
The initial prompt was a classic human and AI misunderstanding. With no cultural knowledge about the cartoon character SpongeBob SquarePants, the AI model drew something very literally. The result was a Tim Burton mixed with Salad Fingers horror creature.
After realize I had to guide the AI model literally it turned in the right direction. Rather than relying on a cultural phrase that an AI model couldn't generally understand I gave it a literal outline of what I wanted.
The second result is rather entertaining in two ways. It drew exactly what I wanted but also because it put together a Hitler Squidhands from Grease. But overall the new direction was promising because the foundational parameters were being outputted. The next steps were to refine the output in an artistic direction closer to how I expected a SpongeBob Balenciaga would appear. The resulting prompts didn't vary a terrible amount but made a significant difference in the output.
The departure from using 'gothic' and the specification of a 'yellow' sponge started to really make a difference. During this time I was also playing with various styles like pixel art, photo realism, and others to see a good path forward.
The next round of images started to get very close my original goal. The background of sand dudes was a frustration tactic on my part. I was not able to specify 'underwater' or 'seafloor' in the parameters without some odd output. Additionally 'beach' or similar words kind of skewed the results in a way I didn't want to go. One thing I do want to note is the hyper realistic sponge for the female model. I struggled to repeat the usage of such a sponge later on in the process. Per the video from Adobe Firefly this might be possible in the future where you could keep a specific element.
The man's head looks like a cheese cube at first glance. Since there is no current way to modify a specific element of the image output I was left with regenerating or changing the text. As I mentioned before, if you start adding too many text parameters to the input the output begins to break. I presume this is the AI model attempting to fulfill all requests into the output but not realizing it is breaking reality in the process.
The last few prompts were minor in change but were repeatedly regenerated to see what the AI model could come up with next. Since there was no substantive changes to the text prompt I am not sure the process is easily repeatable. You just had to keep regenerating and hope for a desirable output. And yet here are my two candidates for SpongeBob SquarePants Balenciaga from Adobe Firefly (Beta):
For a service in beta and a non-trained AI shaper on his lunch break, I think the output is very powerful. With additional training time for the AI model and an experienced human shaper the results could be indistinguishable from real illustrators and possibly photographers. The transformative implications of this capability are still in the early stages of all industries. I am equally horrified and extremely excited to see where it goes.