WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Andre Elizondo,

Shun Mao,

James Yi

Sep 11, 2023

Back to Blog

Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs

WhyLabs
ML Monitoring
AI Observability
Partnerships
Integrations

Andre Elizondo,

Shun Mao,

James Yi

Sep 11, 2023

This is a summary of ‘Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs AI Observatory Platform’, a blog post written in collaboration with AWS. To read the full article, visit the AWS Partner Network (APN) blog.

This article explores the various ways embeddings are used in Machine Learning (ML) and identifies potential problems that may arise. It also explains how to use WhyLabs to identify these issues and set up monitoring systems to stop them from happening again in the future.

What are embeddings?

Embeddings are a way to represent complex data types as numerical representations that preserve context and relationships. They can be sparse or dense to represent different types of data, and embeddings are heavily used in machine learning for a variety of data types and tasks as inputs, intermediate products, and outputs.

How are embeddings used?

Although there are numerous approaches to managing embedding creation, we won't go over them in this post. Instead, we'll go over how embeddings can be used to measure meaningful drift in the transformed inputs, which can be used to identify close clusters of centroids or distances between individual centroids.

Typically for debugging, data scientists would use lower dimensional representations like UMAPs or t-SNE, which is helpful to visually identify clusters but isn’t a scalable approach to understand your embeddings over time in production.

To handle this in a scalable way, whylogs, the open-source library for logging any kind of data, creates a lightweight statistical profile of your data that can be used to extract meaningful insights and characteristics, letting you measure quality and drift over time.

Using whylogs, customers are able to identify centroids in their embeddings and measure different distances inside different clusters. This can be useful for figuring out how your embeddings change over time or suddenly change in response to an upstream data change. Read more in this blog post.

*Figure 1 – Visualization of embedding space.*

Train and deploy a classification model

By setting up and training a simple classification model in Amazon SageMaker, you can build, train, and deploy ML models using fully managed infrastructure, tools, and workflows.

Check out the full article for detailed steps on:

Using the newsgroup datasource to create vectors and train our model on those vectors.
Creating an entrypoint script that defines how to load our model and make predictions.
Deploying our model to an endpoint so we can make some batched predictions and compare our results.
Finally, we use a pretrained model on this same dataset to optimize the process of defining the endpoint in SageMaker.

Measuring embedding distances with whylogs

Once we have our model trained and entrypoint defined, we’ll capture a set of reference points in our embeddings to help identify the centroids in embeddings that our model was trained on. This will help us compare distances for unique centroids in our embeddings during inference, which we will revisit a bit later in this post.

To capture our reference points, we have a few different options in whylogs. We can manually define relationships, or let whylogs automatically identify centroids based on corresponding labels or by utilizing an unsupervised clustering approach.

To follow along with the example, visit the full blog post.

Monitoring embedding drift with WhyLabs Observatory

If you’ve followed through the original article, at this point we have a trained model, reference embeddings, and a whylogs resolver defined to extract the information we want from our embeddings. In order to see the power of measuring embeddings distances, we’ll create a scenario where we are using our classifier to predict the class of document it learned from our training set.

Here’s a high-level architecture of what we’ve done:

*Figure 2 – Architecture of the integration in this post.*

When we open our project in WhyLabs, we see that our profiles were successfully generated for each batch and submitted to the platform. We won’t cover every feature and output created by our resolver but will highlight three of them below.

Observe introduced drift in WhyLabs

Now you’ll have access to a number of different features in your dashboard that represent the different aspects of the pipeline we monitored:

news_centroids: Relative distance of each document to the centroids of each reference topic cluster, and frequent items for the closest centroid for each document.
document_tokens: Distribution of tokens (term length, document length and frequent items) in each document.
output_prediction and output_target: The output (predictions and targets) of the classifier that will also be used to compute metrics on the “Performance” tab.

With the monitored information, we should be able to correlate the anomalies and reach a conclusion about what happened.

news_centroids.closest

In the chart below, we can see the distribution of the closest centroid for each document. For the first four days, the distribution is similar between each other. The language perturbations injected in the last three days seem to skew the distribution towards the “forsale” topic.

document_tokens.frequent_terms

Since we removed the English stopwords in our tokenization process but didn’t remove the Spanish stopwords, we can see that most of the frequent terms in the selected period are the Spanish stopwords, and those stopwords don’t appear in the first four days.

Performance.F1

In the “Performance” tab, there is plenty of information that tells us our performance is degrading. For example, the F1 chart below shows the model is getting increasingly worse starting from the fifth day.

*Figure 5 – F1 performance metric visualization in WhyLabs.*

For now, we’ll focus on how to use WhyLabs to monitor these and be notified in the future when our dataset changes and impacts our models performance. We cover the steps below in more detail in this blog post.

Navigate to the Monitor Manager and select the “Presets” tab.
Next, create a drift monitor on discrete inputs using the “Configure” option on the “Data drift in model inputs” for “All discrete inputs.” Click through to modify the drift distance threshold under section 2 and leave everything else the same.
Lastly, use the save button at the bottom to complete creating our monitor.

Now, we’ll test our monitor on the “news_centroids.closest” feature to show the drift in categorical distribution when we changed our language to Spanish, causing the “forsale” cluster to become the closest centroid cluster more consistently.

*Figure 8 – Monitor failure preview in WhyLabs for ‘news_centroids.closest’ input.*

We can see that WhyLabs identified the drift in closest clusters which would have triggered an alert to our downstream notification endpoint. This can help us to avoid a sudden change like this in the future.

Start your WhyLabs and Amazon SageMaker journey

Embarking on your journey with WhyLabs and Amazon SageMaker is simple. Take a look at our sample notebook the example in this post is built from, and then make your way over to WhyLabs Observatory to create a free account and begin monitoring your SageMaker models.

You can also learn more about the WhyLabs AI Observability Platform in AWS Marketplace.

Andre Elizondo,

Shun Mao,

James Yi

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs

What are embeddings?

How are embeddings used?

Train and deploy a classification model

Measuring embedding distances with whylogs

Monitoring embedding drift with WhyLabs Observatory

Observe introduced drift in WhyLabs

news_centroids.closest

document_tokens.frequent_terms

Performance.F1

Start your WhyLabs and Amazon SageMaker journey

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs