WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Learning Center/LLM Deployment and Observability/Lesson 2

LLM Deployment: Using API Providers

Introduction/overview

Key ideas

Use LLM API Providers such as OpenAI, Cohere, Anthropic, and so forth to deploy state-of-the-art LLMs for your applications quickly. However, these providers can incur usage costs.
Understand optimal configurations for using LLM API Providers to choose the most suitable Provider for your projects.
API Gateways are an essential component of how most API Providers work. You should look for platforms with user-friendly API management features.

As you learned in the last lesson, you have high-level prerequisites for deploying LLMs. In this one, you will learn about LLM APIs and AI Gateways. A hands-on demo will teach you how to build an LLM application with an API.

You will deploy a pre-trained LLM with one of the tools we discussed in the last lesson and generally compare how using APIs fares against this deployment.

Considerations for deploying Large Language Models (LLMs)

Before deploying an LLM, there are some things to consider. Let’s group them into technical and non-technical considerations—some of which we covered in the last lesson.

Technical considerations

Data: LLMs rely heavily on the data on which they are trained and fine-tuned. Ensure your data is high-quality, unbiased, and relevant to your use case.
Costs: Running LLMs can require significant computing power. Ensure you have the infrastructure to handle the workload and factor in ongoing costs.
Testing and evaluation: Rigorously test your LLM before deployment to identify weaknesses and ensure it performs as expected. This includes evaluating for accuracy, bias, and unexpected outputs.
Security: Consider potential security risks associated with data privacy and unintended manipulation of the LLM. Implement safeguards to protect sensitive information and prevent misuse.

Non-technical considerations

Business case: Clearly define the problem your LLM will solve and how it benefits your organization. Measure its success based on specific goals.
Ethical implications: Be aware of potential biases in your LLM and how they might generate unfair or misleading outputs. Implement safeguards to promote ethical use.
Monitoring and maintenance: Plan for ongoing monitoring of your LLM's performance after deployment. This allows you to identify and address issues as they arise.

You can deploy an LLM using an API provider or a pre-trained model. Let’s start with using LLM API providers.

Using a LLM API provider

A LLM can provide access to LLMs through an Application Programming Interface (API). This lets you quickly include LLM features like code completion, translation, and text production into your application without starting from scratch with intricate AI models.

Some of the most popular LLM API providers include OpenAI (GPT-4), Cohere, and Anthropic (Claude).

How an LLM API provider works

The diagram below shows how an LLM API provider works under the hood:

API gateway: Serves as the entry point for external requests. It manages API traffic, enforces security policies, and directs requests to the appropriate services.
Authentication and authorization: This verifies the identity and permissions of users or applications, ensuring that only authorized entities can access the API and safeguarding against unauthorized usage.
Request processing: This handles incoming language-related requests by analyzing and preprocessing user queries, extracting relevant information, and preparing data for model input.
Language model service: This executes language-related tasks using pre-trained language models to comprehend and generate human-like text based on the processed input.
Response processing: Formats and processes model outputs, takes the model's response, post-processes it, and prepares it for delivery back to the user.
Rate limiting: This controls the rate of incoming requests by enforcing usage limits to prevent abuse and ensure fair and efficient use.
Logging and monitoring: This component captures and monitors system activities to keep track of API usage, performance metrics, and potential issues for troubleshooting, analysis, and optimization.
Billing and usage analytics: This manages usage-based billing and analytics. It also tracks API usage for billing purposes and gives users insights into their consumption patterns.

Using LLM APIs can benefit all profiles, but understanding the parameters might be challenging. However, the secret to getting the greatest results for your use case is understanding how to configure the LLM.

To use an LLM API provider, you have to consider the following configurations:

Configuration	Consideration
LLM selection	Choose a model based on your application's requirements, balancing computational needs, accuracy, and cost.
Parameter configuration	Adjust parameters to alter the LLM's processing of inputs and generation of outputs for optimal results.
Token management	Manage the number of tokens to control the balance between output detail, length, and associated costs.
Temperature setting	Set the temperature to adjust the model's creativity, influencing the predictability and uniqueness of its responses.
Top-P and Top-K sampling	Use Top-P and Top-K settings to regulate the randomness and precision of the LLM's text generation.
Stop sequence implementation	Define a stop sequence to precisely control the model's output's endpoint and ensure its appropriate length and completion.
Presence and frequency penalties	Apply presence and frequency penalties to reduce repetition and increase the diversity of the LLM's outputs.
Function calling	Function calling enables the LLM to interact with external APIs or data sources to enrich its responses and utility.

Pros of using LLM API providers

Faster time to market: Pre-built APIs and readily available models allow quicker application integration.
Ease of use: User-friendly interfaces and minimal technical expertise are required for deployment.
Access to latest models: Leverage cutting-edge LLMs developed by the provider.
Security: Providers handle security measures and may offer additional features.

Cons of using LLM API Providers

Cost: Licensing fees, subscriptions, or usage-based pricing can be expensive.
Limited customization: Restricted control over the model's behavior and data usage.
Data privacy: Reliance on the provider's data security practices, potentially raising privacy concerns for sensitive data.

LLM API example walkthrough: smart email summarizer

The Smart Email Summarizer will take an email as input and provide a summarized version, capturing the key points and intentions. This project will use OpenAI's GPT-3.5 Turbo model to generate coherent and contextually relevant summaries.

Step 1: Installation

Install the `openai` module version that supports your application needs. In this case, `openai==0.28`.

!pip install openai==0.28

Step 2: Set up OpenAI

Import the openai library to interact with the OpenAI model. The following assumes your OpenAI API key is set in an environmental variable on your machine, but you can alternatively assign it to `openai.api_key` as a string. This documentation page explains how to get your API key if needed.

import openai
import os

openai.api_key = os.environ['OPENAI_API_KEY']

If in a Colab notebook, you can add your OpenAI key to the Secrets tab and enable access to the notebook:

Step 3: Define the application function

Define a function summarize_email() that takes email_content as a parameter. You will use this function to generate a summary of the given email. Use the OpenAI GPT-3.5 Turbo model to generate a summary.

The `messages` parameter contains the model's context, and the user's role is specified as "user". Based on your preferences, adjust parameters like max_tokens and temperature.

def summarize_email(email_content):
   # Context for the model
   model_context = "Summarize the following email:\n" + email_content


   # Generate summary using OpenAI GPT model
   response = openai.ChatCompletion.create(
       model="gpt-3.5-turbo",  # Use the appropriate OpenAI engine
       messages=[
       {
           "role": "user",
           "content": model_context,
       }],
       max_tokens=100,  # Adjust as needed for desired summary length
       temperature=0.7  # Adjust for creativity vs accuracy trade-off
   )


   # Extract and return the generated summary
   summary = response.choices[0].message["content"]
   return summary

Step 4: Usage

Extract the generated summary from the OpenAI GPT response and return it. In the __main__ block, provide a sample email content, generate a summary using the summarize_email() function, and print both the original and summarized emails.

if __name__ == "__main__":
   # Sample email content
   sample_email = """
   Dear [Recipient],


   I trust this email finds you in good health. I am writing to provide you with a comprehensive update on the ongoing project, which has reached a significant milestone.


   Over the past few weeks, our dedicated team has been diligently working on various aspects of the project. I am pleased to report that we have successfully completed the initial phase, encompassing detailed planning, resource allocation, and defining key project deliverables. This foundational work has laid a robust framework that ensures the project's alignment with our strategic objectives.


   The core team, comprised of skilled professionals from diverse backgrounds, has been collaborating seamlessly to address both technical and logistical challenges. Notably, the development team has made remarkable progress in implementing the required features and functionalities outlined in the project scope. They have successfully navigated through complexities, ensuring a scalable and efficient system architecture.


   Moreover, our quality assurance team has been rigorously testing each component to guarantee the reliability and security of the system. Preliminary tests have yielded positive results, instilling confidence in the overall robustness of our solution. We are committed to maintaining the highest standards in quality throughout the development lifecycle.


   As we transition into the next phase, which involves user acceptance testing and feedback integration, we remain focused on adhering to the project timeline. Our goal is to seamlessly incorporate user insights, ensuring the final product exceeds expectations. Simultaneously, our project management team continues to monitor key performance indicators and milestones, facilitating transparent communication and accountability.


   In conclusion, I am thrilled to share the progress we have achieved thus far. The dedication and expertise exhibited by our team are emblematic of our commitment to delivering a superior product. I look forward to your valuable feedback and collaboration as we embark on the next stages of this exciting venture.


   Thank you for your continued support.
   ...

   Best regards,
   [Your Name]

   """

   # Generate summary
   summarized_email = summarize_email(sample_email)


   # Display the results
   print("Original Email:\n", sample_email)
   print("\nSummarized Email:\n", summarized_email)

Here’s a sample output if you integrate this script into your production application:

Summarized Email:
 The email provides an update on a project that has reached a significant milestone. The initial phase has been successfully completed, with detailed planning and key project deliverables defined. The core team has been collaborating effectively to address challenges, with the development team making progress on implementing required features. Quality assurance testing has been rigorous, yielding positive results. The next phase involves user acceptance testing and feedback integration. The project remains focused on adhering to the timeline and maintaining high-quality standards. The email concludes with a request for

Nice! In the next lesson, you will learn how to deploy open-source pre-trained LLMs locally to an API endpoint. Jump right into it! 👉