Troubleshooting User Behavior Changes and Performance Drift in LLMs
Hands-on workshop on diagnosing issues in LLM-powered applications using text quality metrics
BERNEASE HERMAN
Senior Data Scientist
WhyLabs
Large language models (LLMs) rarely provide consistent responses for the same prompts over time. It could be due to changes in your LLM model’s performance, but it can also be a result of changes in user behavior. Text quality metrics, when combined, can help to pinpoint and mitigate these issues without the need for expensive ground truth labeling.
This workshop covers:
- Different types of data drift common to LLM applications
- Sentiment, toxicity, and vocabulary metrics useful for text applications
- Combining text quality metrics to measure change in user behavior and model performance
- Translating changes in text quality into actionable mitigation techniques