In this article, you will learn:
- What is a token?
- What happens when you use more than 75 tokens in Stable Diffusion
- Smart ways to save tokens and stay under the 75-token limit.
Note: I focus on Stable Diffusion in this article, but the way tokens work is essentially the same across all AI Art generation programs. Even LLM Chatbots break prompts into tokens – although those can have prompt token limits in the millions.
What Is a Token?
Simply put, a token is a word or part of a word in a prompt. As of version 1.5, Stable Diffusion had a list of approximately 30,0000 words that it knew. (Nobody has seen an updated list since, but supposedly that list remained the same at least through Stable Diffusion 2 and SDXL.) If your prompt contains a word that is on that list, that word will use one token. When you use a word that is not on that list, Stable Diffusion breaks the word up into chunks: either words it knows, like “Art” and “Station” for “Artstation”, or short word parts called morphemes, like “rut,” “kow,” and “ski” for prompter favorite “Greg Rutkowski.” Often, it will use a mix of words and morphemes, as in “hippo,” “potom,” “onst,” “roses,” “quip,” “pedal,” “io,” and “phobia” for hippopotomonstrosesquippedaliophobia, the fear of long words. You can find the list of words, morphemes , and other things Stable Diffusion counts toward you token quota here..
Names and Tokens
Full names usually take at least two tokens, and most get broken into more. For instance, prompter favorite Greg Rutkowski requires four tokens – Stable Diffusion knows “Greg,” but not his last name. “Rutkowski” gets broken into two smaller words and a phoneme: “Rut,” “kow,” and “ski.” But depending on their fame, some artists’ names only use one token. The Michaelangelo, Leonardo, and Raphael, for instance. (Sadly, Donatello was left out of the one-token club.) Even VincentVanGogh uses just one token, as long as you type it as one word like that. At the other extreme, Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso takes thirty tokens. (Although “Picasso” by itself only takes one.)
A word’s absence on the list of one-token words doesn’t automatically mean that Stable Diffusion doesn’t know that word.There are more than a few celebrities, from one-token superstars like Zendaya or ElonMusk, to two-token B-listers like Ryan Reynoldshostinger and If you want a picture of “Saoirse Ronan and Djimon Hansou, wearing kaftans, in the style of Hieronymous Bosch,” Stable Diffusion can make one, even though none of those names are in its official list of tokens.
Numbers, Punctuation, and Symbols
Numbers use one token per digit. An easy way to save tokens is to write numbers as words – “ten billion” only uses two tokens, but 10000000000 uses eleven. And “ten billion dollars” uses three tokens, but “$10,000,000,000” uses fifteen, because punctuation marks and symbols like “$” and “@” use a token each as well. This is particularly important for prompt engineering, because many people use commas to separate phrases in their prompts.
Here is an example: “a painting of a beautiful woman, long brown hair, blue eyes, wearing a sundress, smiling and laughing, in the Neoclassical, Academic style of John William Waterhouse, John William Godward, and William-Adolphe Bouguereau.” This prompt is forty-eight tokens long, but it uses more than twenty percent of those tokens on punctuation – eight tokens for commas and one each for the dash in “William-Adolphe and the period at the end.” Most of the time, you can skip these. Stable Diffusion can understand “a painting of a beautiful woman long brown hair blue eyes wearing a sundress smiling and laughing in the Neoclassical Academic style of John William Waterhouse John William Godward and William Adolphe Bouguereau” just as well.
c
Still, that prompt is difficult for humans to read. Here is another option:
Painting of beautiful woman
long brown hair
blue eyes
sundress
smiling laughing
by John William Waterhouse John William Godward William Adolphe Bouguereau
Anytime you would put a comma or other punctuation
Just hit return instead
It keeps prompts organized
And Stable Diffusion sees returns as spaces
So they don’t use tokens
Be careful, though – If the AI gets confused and starts giving you images of long brown women with blue hair weathering sundresses with pictures of eyes and smiles, it’s time to add some commas back in.
You probably also noticed that several of the words in the original prompt are missing. Stable Diffusion doesn’t need articles like “a” or “the,” or conjunctions like “and.” It is also smart enough to figure out that “sundress” is something you wear, and that all these painters use a Neoclassical Academic art style. Using “by” instead of “in the style of“ saves another three tokens.
To save still more tokens, you could remove “painting of,” because Stable Diffusion knows you want a painting when you ask for an image in the style of painters; “long brown hair” because nearly all of the women in Neoclassical paintings have long brown hair; and “smiling,” because you can’t laugh without smiling. You can even take out “woman” – Stable Diffusion can figure out that you want an image of a woman just from “beautiful blue eyes sundress.”
Removing all unnecessary words and punctuation, the prompt
Beautiful
Blue eyes
Sundress
Laughing
By Waterhouse Godward Bouguereau
uses only twenty tokens. Quite a drop from the original forty-eight!
Why Are Short Prompts Important?
Why go to all this effort to make the shortest prompt possible? In the most general sense, every token in a prompt is something Stable Diffusion has to “think” about. Even if it doesn’t do anything with commas or conjunctions, it still has to take the time to “look at them” and decide what to do about them if anything, resulting in longer wait times. Also, the fewer details in a prompt, the more the AI can focus on each one. “Blue eyes” is not easy to get in a neoclassicacl painting. Ask for blue eyes somewhere in the middle of a forty-eight-token prompt,and you’ll be disappointed. As the second and third tokens in a twenty-token prompt, though, those peepers will get all the attention they need.
The most important reason to keep prompts short is Stable Diffusion’s 77-token limit. It takes one token to start a prompt and one to end it, so the maximum length for a Stable Diffusion prompt is 75 tokens, whether those tokens are words, parts of words, numbers, or punctuation. Writing a prompt with more than 75 tokens can lead to various issues. The most likely outcome is that Stable Diffusion will just ignore everything else in the prompt after it reaches 75 tokens. Occasionally, though, it will cut earlier details out to add new ones in. In extreme cases, too many details can confuse Stable Diffusion entirely and lead to wildly unpredictable outcomes. Balancing token usage is critical to getting good images.
Conclusion
To understand your prompts, the text-to-image AI models behind AI art generators like DALL-E, Midjourney, and Stable Diffusion break them into tokens, which can be words, parts of words, numbers, or symbols. Understanding what counts as a token, managing prompt length by removing unnecessary words, and exploring alternative formatting techniques can make prompts more effective and avoid the risks inherent in going over Stable Diffusion’s 77-token limit.
Leave a Reply