Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs
- WhyLabs
- ML Monitoring
- AI Observability
- Partnerships
- Integrations
Sep 11, 2023
This is a summary of ‘Understanding and Monitoring Embeddings in Amazon SageMaker with WhyLabs AI Observatory Platform’, a blog post written in collaboration with AWS. To read the full article, visit the AWS Partner Network (APN) blog.
This article explores the various ways embeddings are used in Machine Learning (ML) and identifies potential problems that may arise. It also explains how to use WhyLabs to identify these issues and set up monitoring systems to stop them from happening again in the future.
What are embeddings?
Embeddings are a way to represent complex data types as numerical representations that preserve context and relationships. They can be sparse or dense to represent different types of data, and embeddings are heavily used in machine learning for a variety of data types and tasks as inputs, intermediate products, and outputs.
How are embeddings used?
Although there are numerous approaches to managing embedding creation, we won't go over them in this post. Instead, we'll go over how embeddings can be used to measure meaningful drift in the transformed inputs, which can be used to identify close clusters of centroids or distances between individual centroids.
Typically for debugging, data scientists would use lower dimensional representations like UMAPs or t-SNE, which is helpful to visually identify clusters but isn’t a scalable approach to understand your embeddings over time in production.
To handle this in a scalable way, whylogs, the open-source library for logging any kind of data, creates a lightweight statistical profile of your data that can be used to extract meaningful insights and characteristics, letting you measure quality and drift over time.
Using whylogs, customers are able to identify centroids in their embeddings and measure different distances inside different clusters. This can be useful for figuring out how your embeddings change over time or suddenly change in response to an upstream data change. Read more in this blog post.
Train and deploy a classification model
By setting up and training a simple classification model in Amazon SageMaker, you can build, train, and deploy ML models using fully managed infrastructure, tools, and workflows.
Check out the full article for detailed steps on:
- Using the newsgroup datasource to create vectors and train our model on those vectors.
- Creating an entrypoint script that defines how to load our model and make predictions.
- Deploying our model to an endpoint so we can make some batched predictions and compare our results.
- Finally, we use a pretrained model on this same dataset to optimize the process of defining the endpoint in SageMaker.
Measuring embedding distances with whylogs
Once we have our model trained and entrypoint defined, we’ll capture a set of reference points in our embeddings to help identify the centroids in embeddings that our model was trained on. This will help us compare distances for unique centroids in our embeddings during inference, which we will revisit a bit later in this post.
To capture our reference points, we have a few different options in whylogs. We can manually define relationships, or let whylogs automatically identify centroids based on corresponding labels or by utilizing an unsupervised clustering approach.
To follow along with the example, visit the full blog post.
Monitoring embedding drift with WhyLabs Observatory
If you’ve followed through the original article, at this point we have a trained model, reference embeddings, and a whylogs resolver defined to extract the information we want from our embeddings. In order to see the power of measuring embeddings distances, we’ll create a scenario where we are using our classifier to predict the class of document it learned from our training set.
Here’s a high-level architecture of what we’ve done:
When we open our project in WhyLabs, we see that our profiles were successfully generated for each batch and submitted to the platform. We won’t cover every feature and output created by our resolver but will highlight three of them below.
Observe introduced drift in WhyLabs
Now you’ll have access to a number of different features in your dashboard that represent the different aspects of the pipeline we monitored:
- news_centroids: Relative distance of each document to the centroids of each reference topic cluster, and frequent items for the closest centroid for each document.
- document_tokens: Distribution of tokens (term length, document length and frequent items) in each document.
- output_prediction and output_target: The output (predictions and targets) of the classifier that will also be used to compute metrics on the “Performance” tab.
With the monitored information, we should be able to correlate the anomalies and reach a conclusion about what happened.
news_centroids.closest
In the chart below, we can see the distribution of the closest centroid for each document. For the first four days, the distribution is similar between each other. The language perturbations injected in the last three days seem to skew the distribution towards the “forsale” topic.
document_tokens.frequent_terms
Since we removed the English stopwords in our tokenization process but didn’t remove the Spanish stopwords, we can see that most of the frequent terms in the selected period are the Spanish stopwords, and those stopwords don’t appear in the first four days.
Performance.F1
In the “Performance” tab, there is plenty of information that tells us our performance is degrading. For example, the F1 chart below shows the model is getting increasingly worse starting from the fifth day.
For now, we’ll focus on how to use WhyLabs to monitor these and be notified in the future when our dataset changes and impacts our models performance. We cover the steps below in more detail in this blog post.
- Navigate to the Monitor Manager and select the “Presets” tab.
- Next, create a drift monitor on discrete inputs using the “Configure” option on the “Data drift in model inputs” for “All discrete inputs.” Click through to modify the drift distance threshold under section 2 and leave everything else the same.
- Lastly, use the save button at the bottom to complete creating our monitor.
Now, we’ll test our monitor on the “news_centroids.closest” feature to show the drift in categorical distribution when we changed our language to Spanish, causing the “forsale” cluster to become the closest centroid cluster more consistently.
We can see that WhyLabs identified the drift in closest clusters which would have triggered an alert to our downstream notification endpoint. This can help us to avoid a sudden change like this in the future.
Start your WhyLabs and Amazon SageMaker journey
Embarking on your journey with WhyLabs and Amazon SageMaker is simple. Take a look at our sample notebook the example in this post is built from, and then make your way over to WhyLabs Observatory to create a free account and begin monitoring your SageMaker models.
You can also learn more about the WhyLabs AI Observability Platform in AWS Marketplace.
Other posts
Best Practicies for Monitoring and Securing RAG Systems in Production
Oct 8, 2024
- Retrival-Augmented Generation (RAG)
- LLM Security
- Generative AI
- ML Monitoring
- LangKit
How to Evaluate and Improve RAG Applications for Safe Production Deployment
Jul 17, 2024
- AI Observability
- LLMs
- LLM Security
- LangKit
- RAG
- Open Source
WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control
Jun 2, 2024
- AI Observability
- Generative AI
- Integrations
- LLM Security
- LLMs
- Partnerships
OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety
May 21, 2024
- LLMs
- LLM Security
- Generative AI
7 Ways to Evaluate and Monitor LLMs
May 13, 2024
- LLMs
- Generative AI
How to Distinguish User Behavior and Data Drift in LLMs
May 7, 2024
- LLMs
- Generative AI