July 29th, 2024

Screenshot 2024-07-29 at 10.16.24 AM.png

Conveying your intention to a Large Language Model (LLM) is not easy. It requires a combination of techniques, each one requiring considerable thinking for yourself and compute for the GPUs. The most wasteful of these techniques is prompt engineering, which requires days or weeks of trial and error by a practitioner. The idea is to use the system prompt of an LLM to set up guardrails, give few-shot examples of inputs and outputs, and sometimes even plead with the LLM. For example, the following phrases have produced a measurable increase in output quality: “I will tip you $100 for a perfect answer” or “Please, my career depends on this output.”

Prompt engineering is good for prototyping LLM-based applications. System prompts are too sensitive to minor changes for them to be a major factor of a production system’s correctness. Prompt engineering is a stopgap measure for these early days of LLM-based applications and is unsustainable in the long term. The idea that a set of tokens specified at the beginning of a user’s session will perfectly convey the developer’s intention is wishful thinking. Thankfully, modern techniques are making prompt engineering obsolete.

The most robust alternative to prompt engineering is soft prompting. The basic idea is to fine-tune the input layer of the LLM based on application-specific datasets. In other words, rather than relying entirely on the embeddings of the tokens in the system prompt, a trainable prompt is added to the LLM’s input. This approach effectively relieves the practitioner from the pressure of perfectly conveying the intention in words. The model learns at least part of the system prompt by itself, based on the application-specific fine-tuning dataset on which the soft prompt is trained. The encoded soft prompt may end up being nonsense if interpreted into real tokens, but it efficiently captures the nuances of the task being trained on.

Within soft prompting, researchers have uncovered several approaches. The earliest approach was prompt tuning, which involved an additional application-specific layer before the LLM receives an input. Later approaches, such as prefix tuning, required an additional few virtual tokens to be prepended to the system prompt to specialize an LLM on a given task during fine-tuning, and parameters added to every layer of the LLM. These techniques each have upsides and downsides, including latency concerns, specificity to the task at hand, and computational efficiency of training.

The LLM subfield of soft prompt is constantly evolving, with new techniques emerging month after month. If you’re a practitioner, you must choose a specific soft prompting technique based on your unique use case. Soft prompting is most effective when used in conjunction with supervised and unsupervised fine-tuning, ideally performed in a continuous manner to prevent drift.

If you’d like to improve the quality of your LLM applications with soft prompting and continuous fine-tuning, visit our website: www.plumdefense.com