The Art of Prompt Design: Prompt Boundaries and Token Healing

This (written jointly with Marco Tulio Ribeiro) is part 2 of a series on the art of prompt design (part 1 here), where we talk about controlling large language models (LLMs) with <code>guidance</code>. In this post, we’ll discuss how the greedy tokenization methods used by language models can introduce a subtle and powerful bias into your prompts, leading to puzzling generations. Language models are not trained on raw text, but rather on tokens, which are chunks of text that often occur together, similar to words. This impacts how language models ‘see’ text, including prompts (since prompts are just sets of tokens). GPT-style models utilize tokenization methods like Byte Pair Encoding (BPE), which map all input bytes to token ids in a greedy manner. This is fine for training, but it can lead to subtle issues during inference, as shown in the example below. <h1>An example of a prompt boundary problem</h1> Consider the following example, where we are trying to generate an HTTP URL string: <pre> import guidance # we use StableLM as an example, but these issues impact all models to varying degrees guidance.llm = guidance.llms.Transformers("stabilityai/stablelm-base-alpha-3b", device=0) # we turn token healing off so that guidance acts like a normal prompting library program = guidance('The link is <a href="http:{{gen max_tokens=10 token_healing=False}}') program()</pre>   Notebook output. Note that the output generated by the LLM does not complete the url with the obvious next characters (two forward slashes). It instead creates an invalid URL string with a space in the middle. This is surprising, because the <code>//</code> completion is extremely obvious after <code>http:</code>. To understand why this happens, let’s change our prompt boundary so that our prompt does not include the colon character: <a href="https://towardsdatascience.com/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38">Read More</a>