Basics of Prompt Tokenization in the Generation of Art

Apr 11, 2023

A prompt is like a sentence or instruction commanding the AI to produce a result. In Midjourney, the result is an image. For LLMs in general neural networks interpret text via a process called tokenization, which for a layman like me means basically break it into smaller parts. Any language can be broken down into pieces, and so does a prompt within Midjourney. Different LLMs have different types of levels of tokenization be it character level, word level, parts or words level or sentence level tokenization. From what I understand, Midjourney takes the prompt and tokenizes it. Tokenization of prompt takes the sentence and divides it into words / parts of words / commands and assigns them as a token of which is given weighting. For Midjourney, the maximum number of tokens assigned from what I know as of Feb 2023 is 75 Tokens. For an example of token counting, you can head to:

https://platform.openai.com/tokenizer

This is a token counter for GPT-3 prompt, but here you can see what the model does processing text using tokens:

Here is an actual prompt I used in Midjourney to create a manga image that I used. Here you can see within ChatGPT3.0 this prompt is counted as 64 tokens.

Regarding the weighting, straight from Mendenhall from the Midjourney Prompt FAQs forum – as of V4 words 1-20 have high influence over the overall image with words 21-40 strongly in play and words 40+ less cooperative but still more likely to be in play than any other version of previous Midjourney. In other words, words at the front of the prompt have higher weighting than words that appear at the end of the sentence prompt. So keep the prompt clear and concise. Some tutorials go on with prompts that are paragraphs long but actually all the words towards the end account for very little as they have little weighting and probably are not even assigned tokens as there is a maximum of 75 tokens assigned to each prompt.

Let's move on to basic prompt engineering for AI art.

The most simplistic framework of a Midjourney prompt would be: /imagine: text prompt

Text prompt hereby being a combination of words like, ‘Cyborg is standing’ Well, here’s the result:

A more advanced type of framework would be:

/imagine: image prompts, text prompts, parameters

This is whereby reference images (as links) are pasted into the prompt, followed by the text prompt followed by more technical aspects such as image weightings, text weightings, aspect ratios, style weightings and preferred engine. Let’s do a simple one here:

1. Drag and drop holding shift into the discord

(the one to the left is my reference image)

2. right click and copy image link

3. paste into prompt after the /imagine command

4. continue writing your text prompt

5. Finish with parameters (here my parameter is aspect ratio and engine. So --ar 16:9 is for aspect ratio and --v 5 is for the engine).

So my example here would be: (in blue is the link for the reference image)->

And here's the result:

So these are the bare basics for image generation in AI. I will share some secrets in other posts on how to produce manga level art and consistency.

Charles Man.

Basics of Prompt Tokenization in the Generation of Art

Recent Posts