WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Learning Center/Use Cases of Large Language Models (LLMs)/Lesson 2

Large Language Model (LLM) Agents

Overview/introduction

Key ideas

Implementing LLM agents involves choosing frameworks, integrating tools, and iterative development with thorough testing. Their evaluation focuses on query translation accuracy, tool selection, context relevance, response groundedness, and helpfulness.
LLM agents extend beyond basic LLM capabilities. They integrate with external data sources and handle complex, multi-step tasks with reduced coding requirements. This makes them accessible for diverse applications.
Understanding various agent components (LLMs, prompts, memory, knowledge, planning, and tools) is key to designing effective agents.

What are LLM agents?

Sometimes large language models (LLMs) alone aren’t enough. When we need to coordinate multiple queries to the LLM and/or complete tasks that go beyond text (such as operating on databases or sending emails), we rely on an LLM agent. An LLM agent is software that can plan, coordinate, and execute other programs. An LLM agent often operates by interpreting user inputs and determining the appropriate actions in a sequence that is often non-deterministic, meaning the responses can vary even with similar inputs. This flexibility makes them particularly effective in dynamic environments with beneficial human-like interactions.

In practical applications, LLM agents are not limited to customer interaction roles. They also play a significant part in data analytics tasks such as data cleaning, feature engineering, and even model training. However, these functions are more data-driven and less reliant on language processing capabilities.

In terms of usage, LLM agents can range from specialized to general and utility:

Benefits of LLM agents

Extended capabilities beyond basic LLMs

Agents go beyond the limit of LLMs by integrating them with outside sources of data and computation. For instance, they can access and use information from platforms like Google Search, Wikipedia, or a company's internal wiki via APIs. This integration allows them to carry out more advanced tasks, such as retrieving precise information from databases or using other LLMs for code execution.

Error recovery and handling complex tasks

LLM agents excel at error recovery and multi-step tasks. This is particularly useful in scenarios like database querying. For example, an LLM agent can translate a natural language request into an SQL query, execute it, interpret the results, and respond in natural language. The agent can find and fix SQL query errors. It handles multiple queries, improving task accuracy.

Simplified coding requirements

Unlike rule-based agents, deploying LLM agents requires less coding. The underlying LLM handles most of the computational work, including complex reasoning and language processing. As a result, you can interact with these agents using simple prompts, which makes them accessible for a wide range of applications, even for those with limited programming expertise.

Components of LLM agents

Component	Description
Large Language Models (LLMs)	LLMs are core to agents, allowing them to interact with natural language and translate your queries or those from a source application.
Prompt	Input queries (instructions) you provide to the agent. They can range from simple questions to complex requests.
Memory	Allows the agent to handle tasks sequentially (short-term memory) and contextually (long-term memory)
Knowledge	Agents require domain-specific knowledge through LLM fine-tuning (on APIs to find the appropriate API for tasks) or integrating tools to extract information from databases.
Planning	Ability to sequence actions or responses logically. Agents approach complex problems using query decomposition (chunking the tasks) and reflection (using frameworks to evaluate and refine plans).
Tools	Augments the capabilities of the LLM agent. This may include access to external APIs (Zapier, Notion, etc.), vector databases, query engine tools (like RAG pipelines for real-time retrieval), or other LLMs for specific functions.

Type of LLM agents

ReAct (or "reasoning and acting"):
- Integrates LLMs with external environments for dynamic reasoning.
- Alternates between reasoning traces (like breaking down tasks into steps) and specific actions (such as "search" in question answering).
- Improves complex task handling by merging internal reasoning and external data.

Self-ask with search:
- Uses a search engine and interacts with users for clarifications.
- Effective for ambiguous queries.
- Capable of answering questions, generating content, translating language, and more.
OpenAI function calling:
- Enables calling external functions via the OpenAI API.
- Useful for solving math problems, generating text, and translating language.
- ChatGPT model determines when and what functions to call.
ReAct document store:
- Stores and retrieves information, especially when unavailable online.
- Supports creation, reading, updating, and deletion of documents.
- Handles multiple document formats, including text, JSON, and XML.
Camel agents:
- Specialize in creating simulation environments.
- Ideal for complex interactions in gaming and entertainment.
Generative agents:
- Generates and manages simulation environments.
- Incorporates a reflective step for action and state updates.
- Memory handling is time-weighted, importance-weighted, and relevancy-weighted.

Examples of LLM agents

Example: BabyAGI

BabyAGI uses OpenAI's API for long-term memory and vector databases like Chroma to create, prioritize, and execute tasks. The system focuses on developing the task based on the previous task and the main goal or objective—separating the planning and execution steps. See the recommended resource section to learn more about the project and how it runs.

Example: Voyager agent

Voyager uses GPT-4 as its reasoning engine, enabling it to make intelligent decisions based on the situations it encounters. Its operation involves:

Exploring the environment and reasoning about new situations.
Building or using tools based on their reasoning.
Saving successfully executed tasks in a skill library for future use.
Utilizing its skill library in similar future situations, thus avoiding redundant learning processes.

Other examples of LLM agents include AutoGPT and AgentGPT.

Workflow of an LLM agent

Here’s a typical high-level implementation of how an LLM agent operates:

Input processing: The user’s query is received. This is the starting point where the user's request or question is defined.
Query translation: The agent translates the query into a format optimized for interaction with tool APIs. This involves extracting key information and keywords relevant to the task at hand.
Tool selection: The agent selects the most appropriate tool or API to address the user's request based on the translated query.
Memory utilization: The agent's memory stores a list of previous actions and interactions. This historical context aids in making informed decisions and maintaining continuity in ongoing tasks.
Action observation and execution: The agent observes the output of the chosen tool. If the agent completes the task, there’s usually no further action required. However, if it still requires the next action, the agent executes a specific task, invoking the necessary tools and observing the outcome.
Iterative processing and stopping condition: The process repeats until a stopping condition is met. The LLM (which indicates problem resolution) or predefined hardcoded rules can determine these conditions. The agent executor is crucial in orchestrating this process, managing the flow from input to action execution and completion.

Implementing an LLM agent

Looking to implement an LLM agent? Here are typical steps:

Select the framework:
- Choose frameworks like LlamaIndex and LangChain that are tailored to the specific requirements of your LLM-powered application.
- Evaluate each framework's compatibility with your project goals, focusing on features that support your application's unique needs.
Choose an LLM for the reasoning loop:
- Carefully select it based on its capabilities and alignment with your project's objectives.
- Consider factors like inference speed, language understanding, and integration ease with the framework. The LLM's quality directly influences the agent's performance.
Select an agent type: Decide on the agent type, such as ReAct or OpenAI’s function calling, based on the complexity and nature of tasks. Reference LangChain’s documentation for implementing the different agent types. Align the agent type with the application's purpose to ensure optimal performance.
Integrate with external tools: Integrate the agent with external tools like Google Search APIs, vector databases, and calculators to enhance functionality. Ensure these integrations are secure and compliant with data privacy standards.
Refine the agent’s output: Adopt an iterative approach to continually refine the agent's capabilities. This might include adjusting prompt designs, integrating feedback, and improving tool functionality. Implement agile development methodologies for effective iteration and adaptation.
Evaluation and deployment: Before deploying the agent, evaluate that it can translate queries correctly, select the appropriate tool, the observation fits the context, and the output is correct.

Monitoring and evaluating LLM agents

Evaluation of LLMs, as you learned in Module 1, can be very difficult. This is the same case for LLM agents, but there are best practices you can apply:

Evaluate query translation accuracy:
- Method: Implement a system to compare original user queries with translated versions for tool API interaction.
- Metrics: Focus on consistency in meaning, relevance, and keyword usage. Use conventional LLM metrics to assess accuracy.
Appropriateness of tool selection:
- Assessment strategy: Develop criteria or benchmarks to evaluate the suitability of selected tools for different queries.
- Evaluation: Cross-reference the functionalities of the chosen tools with the requirements of the query to ensure optimal tool selection.
Context relevance of tool's response:
- Checking mechanism: Analyze the tool’s response in relation to the user's query, focusing on how well it addresses the query's context.
- Validation methods: Use predefined response quality metrics to gauge context relevance and compare the output with expected answers.
Groundedness of the final response:
- Evaluation technique: Confirm that the final response is based on and supported by the retrieval context.
- Fact-checking process: Employ cross-referencing with source data and external validators to detect and prevent hallucinations by the agent.
Relevance and helpfulness of the answer:
- User-centric evaluation: Utilize user feedback, automated scoring systems, or comparison with model answers to evaluate the answer's relevance and usefulness.
- Feedback loop: Incorporate continuous user feedback into the evaluation process for real-world effectiveness and practicality.

In the next lesson, you’ll learn how LLMs have practically solved content summarization.