Overview

In a recent case study, Plum demonstrated a significant improvement in the quality of an LLM application’s output from 66% to 82%. By evaluating outputs, augmenting training data, and fine-tuning the model, Plum Defense improved the LLM application using an initial dataset of only 15 examples. The key insight to Plum Defense’s effectiveness is to identify the right metrics for the business case and leverage them to drive a continuous fine-tuning process.

In this case study, we apply Plum’s system to an LLM designed to answer product questions about a company with information from its public facing website. The text is written in a tone that is low in information density and hard to understand. The LLM’s answers should keep the product details but omit extraneous language, but as we’ll see, it underperforms in specific ways.

Here’s an example content from the website:

With [redacted], clients gain 
the ability to scale with a highly automated exception-based process, 
including comprehensive reconciliation and extensive quality control.
Significant experience supporting custom and complex client requirements,
leveraging experience and knowledge throughout the [redacted] organization.

LLM Use Case

The LLM is tasked with rephrasing the information in the product descriptions and extracting short and to-the-point descriptions of the products.

The size of the dataset is 20 webpages, averaging around 300 words each. We hold out 25% (5) of these webpages as our test dataset, which we’ll use later. We use the other 75% (15) as our training dataset.

The system prompt given to ChatGPT:

Streamline the given information section by section,
ensuring that the overall logic, content, and technical terms are retained.

Instructions:
1. Simplify redundant language into concise and dense phrases.
2. Retain key elements: Ensure that important terminology is 
   retained without omitting critical information.

Do not summarize the text or merge various points or paragraphs, 
but instead, shorten them individually. Use bullet points.
Sentences that are too long should be broken down into shorter ones.

Output of the LLM application on the above example text:

[Redacted] features automated processes,
comprehensive reconciliation, customer support, and scalability.

Note that the information has been summarized too much. The details have been left out, and the sentence is too long, which is not ideal.

Evaluation