WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Learning Center/Introduction to Large Language Models (LLMs)/Lesson 5

LLM Adaptation Methods: Prompt Engineering and RAGs

Introduction/overview

Key ideas

Prompt engineering is the skillful art of getting the most performance out of LLMs for your use case without spending a fortune on fine-tuning them. Regardless of your application, finely tuned prompts can significantly improve the relevance and diversity of your AI’s responses, creating more satisfying user interactions.
Examples of prompt engineering techniques? Zero-shot prompting (query the LLM without giving an example), few-shot prompting (query the LLM with a few instances it can learn from), and chain-of-thought prompting (instruct the LLM to think step-by-step through a complex task).
Retrieval Augmented Generation (RAG) is an LLM adaptation method for improving LLM responses by giving them access to an external database. RAG is ideal for scenarios where you don’t want to fine-tune the model, prompt engineering can’t work

Say you are one of many who find fine-tuning Large Language Models (LLMs) prohibitively expensive due to the need for large datasets and significant computational resources. If that’s the case, what if you could tweak your prompts to generate good responses for your use case? Enter prompt engineering.

Prompt engineering large language models

Prompt engineering is the skillful crafting of texts to tease out expected responses from the LLM. It's about providing the right context and instructions to guide the AI towards generating the desired responses at inference time. Think of it as customizing a musical instrument to suit your musical style, where you tailor the prompts to direct the LLM’s "performance.”

Elements of an effective prompt:

Context: Sets the stage for the task, helping the model understand the desired outcome.
Input data: The actual information you're asking the AI to process.
Instructions: Clear directions on what the model is supposed to do.
Examples: Demonstrations of the desired output to model the correct behavior.
Constraints: Boundaries that limit the model’s responses to meet specific requirements.

Crafting your prompt: A step-by-step guide

Prompt engineering is both an art and a science, requiring a balance of creativity and technical know-how. Prompt engineering thrives on experimentation, best practices, and guidelines.

Set the context: Start by establishing the scenario for the model. For example, "You are a travel guide assistant tasked with creating personalized itineraries based on user preferences."
Provide input data: Next, input the details informing the itinerary. This could be the user profile:
- Preference: Cultural experiences over beach vacations.
- Cultural Interests: Renaissance art, medieval architecture, and contemporary European cuisine.
- Number of Days: 5
- Budget Range: $1,000
- Physical Activity Level: Light walking tours. No hiking or biking.
- Language Preferences: English and French
- Travel Experience: North America and Asia.
- Special Requests: Vegetarian-friendly restaurants.
- Avoidances: Touristy spots.
Provide an example output template if you need it:
- “Day 1-3: Paris, France
  - Day 1: Arrival and visit the Louvre to experience Renaissance art. Dinner at a local vegetarian-friendly bistro.”
  - …
Detail instructions: Ask the model exactly what you need in clear and specific terms. For instance, "Recommend a 5-day itinerary for a cultural trip to Europe in the spring. Please include at least three countries."
Incorporate examples and constraints: To ensure the model’s output meets your needs, you might add, "The itinerary should prioritize historical sites and museums. Exclude any recommendations for Spain and Italy, as the user has visited these countries. Ensure the daily schedule is not too packed, allowing for some leisure time."

Let’s test this out in OpenAI’s ChatGPT 3.5.

Here’s a sample of the input:

Here’s the output:

Loved that? You have some simple form of instruction prompting. Let’s figure out techniques that can step-up your prompt engineering skills.

Zero-shot prompting:

This approach involves presenting the LLM with a question or statement without accompanying examples. If you have any prompted LLM, you’ve likely implemented this technique. It's akin to testing the LLM’s ability to generate responses based on its pre-existing knowledge base.

Here’s an example of a zero-shot prompt:

Recommend three vacation locations to visit in Europe this spring for someone interested in cultural arts.

Output:

Few-shot prompting:

Provide examples or a template to nudge the model towards the desired response. The power lies in its ability to guide the AI towards replicating a specific pattern, whether continuing a list in a designated format, adhering to a particular language style or structuring content in a desired manner.

This technique helps the model understand the context better without overwhelming it with too many examples.

Try this example with OpenAI’s ChatGPT:

Recommend three vacation locations to visit in Europe this spring for someone interested in cultural arts.
Favourite cities are: Spain, France, Belgium, Wales, and Poland.                            

Output the recommendations in JSON format using these elements:
- Location
- Budget
- Description
- Rating
- Review

Here is the output:

{
  "destinations": [
    {
      "location": "Barcelona, Spain",
      "budget": "Moderate",
      "description": "Explore the vibrant cultural scene of Barcelona, known for its unique blend of modernist architecture, art museums, and lively street performances. Visit the iconic Sagrada Familia, Picasso Museum, and stroll along the famous La Rambla.",
      "rating": 4.5,
      "review": "Barcelona offers a rich cultural experience with its architectural wonders and world-class museums. A must-visit for art enthusiasts."
    },
    {
      "location": "Paris, France",
      "budget": "High",
      "description": "Immerse yourself in the artistic charm of Paris, the 'City of Lights.' Visit the Louvre to see masterpieces like the Mona Lisa, explore the Montmartre art district, and enjoy performances at the historic Moulin Rouge.",
      "rating": 5,
      "review": "Paris is a cultural paradise with its iconic landmarks, museums, and the enchanting atmosphere. A dream destination for anyone passionate about the arts."
    },
    {
      "location": "Bruges, Belgium",
      "budget": "Moderate",
      "description": "Discover the medieval charm of Bruges, a UNESCO World Heritage city. Admire the well-preserved architecture, visit the Groeningemuseum for Flemish art, and take a boat tour along the picturesque canals.",
      "rating": 4,
      "review": "Bruges is a hidden gem with its romantic canals and medieval architecture. A perfect destination for those seeking cultural enrichment in a serene setting."
    }
  ]
}

This prompt generates better and more comprehensive results, but the challenge is that you’d incur lots of tokens, especially if you make API calls to LLMs.

Chain-of-thought prompting:

Chain-of-thought prompting tackles complex reasoning tasks by guiding the model through logical, intermediate steps. This way, you can decompose multifaceted problems to pave the way for more nuanced and detailed responses. When combined with few-shot prompting, this technique becomes even more potent for tasks requiring deep reasoning.

Here’s an example to guide the LLM with reasoning through a logical sequence:

You have a recipe that calls for 300 grams of flour, but your kitchen scale is only marked in ounces. Convert 300 grams of flour to ounces and determine how many 2-ounce portions you can get from this amount. Use the conversion rate of 1 ounce equals 28.35 grams.
    Step 1: Determine the conversion factor needed to convert grams to ounces. The conversion factor is that 1 ounce equals 28.35 grams.
    Step 2: Convert 300 grams of flour to ounces using the conversion factor. To do this, divide 300 grams by 28.35 grams per ounce.
    Step 3: Calculate how many 2-ounce portions you can get from the total ounces of flour. Divide the total ounces of flour by 2 ounces per portion.

Here’s what a response looks like with ChatGPT:

Great! You can significantly improve the LLM’s performance on complex tasks, making AI interactions more accurate, detailed, and understandable.

Learning prompt engineering would take a whole course or a book. Here’s a nice resource you should bookmark as you try to improve your LLM performance.

Prompt tuning:

We’ll not discuss prompt tuning here, but prompt tuning involves refining your prompts to produce more accurate and diverse outcomes. Unlike prompt engineering, which involves crafting a new prompt from the ground up, prompt tuning tweaks existing prompts for better performance. This can be as simple as changing words or as complex as applying algorithms to improve prompts systematically. Learn more about prompt tuning in this ‘Learn Prompting’ guide.

Retrieval Augmented Generation (RAG)

If you’ve tried prompt engineering with no luck and fine-tuning is not on the table, consider RAGs. Retrieval Augmented Generation (RAG) is an alternative path for model adaptation that is more accessible.

How RAG works:

Integrating domain data: RAG allows you to integrate specific domain data into an LLM without altering the model's core behavior.
Retrieval process: Here, you convert a user query (e.g., a question about tennis rackets) into a vector embedding. This embedding is then used to search a product database for content relevant to the query.
Guiding LLM responses: The retrieved content is incorporated into a prompt, conditioning the LLM to provide a more precise and relevant answer. For instance, “FX 500 Tour Tennis 27.5 Inch Racket from Dunlop stores is sold at $199.

Why RAG is beneficial:

Suitability for Q&A: RAG is particularly effective for question-answer scenarios, where precision and relevance are key.
Resource efficiency: It offers a way to tap into the power of LLMs without the extensive resource requirements of model fine-tuning.
Accessibility: Tools like LangChain and Llamaindex make developing RAG applications more accessible, even for businesses with limited technical resources.

RAG provides a practical and resource-efficient method to adapt LLMs to your use case, particularly in customer service and product query scenarios.

Over the past two lessons, we have seen that the choice between fine-tuning, prompt engineering, or RAG depends on the application's scope and requirements. Each method offers unique benefits for different scenarios—hands-on experimentation will reveal the best fit for your project.

In the next lesson, we’ll learn how to properly evaluate your LLMs so you know what’s working and what isn’t. That’s how you become a good LLM scientist 😉.

LLM Adaptation Methods: Prompt Engineering and RAGs

Introduction/overview

Prompt engineering large language models

Elements of an effective prompt:

Crafting your prompt: A step-by-step guide

Zero-shot prompting:

Few-shot prompting:

Chain-of-thought prompting:

Prompt tuning:

Retrieval Augmented Generation (RAG)

Recommended resources

About

Resources

whylogs

WhyLabs