Case study
"WhyLabs provides a safety net for us that we didn’t have before. As a result, we are able to iterate on new experiments and prompts faster and ship new AI features quickly. We can do so with high confidence, knowing we have quantitative metrics to back up our decisions."
Varun Puri, CEO, Yoodli
Introduction
Yoodli is a technology platform providing AI-powered, private speech coaching that helps people improve their communication skills without the pressure of an audience. Yoodli provides users with personalized and real-time feedback on their filler words, body language, and more to help them ace their upcoming speech, interview, or presentation.
Offering an innovative solution that combines AI technology with human expertise, Yoodli leverages leading-edge technology like Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) workflows for speech analysis, conversational response simulation, and feedback generation. Maintaining the quality, relevance, and consistency of LLM responses is essential for ensuring that Yoodli provides a positive user experience while maintaining control of the AI systems they rely on.
With the LLM landscape changing rapidly, user expectations rising, and new security threats emerging - the Yoodli team was eager to find a solution providing them with control over the data and model assets foundational to their LLM-based application. As a start-up, they needed a solution that was easy to implement while still providing robust monitoring for key metrics over time, anomaly detection, and the ability to compare the output quality of their LLMs for testing various configurations (e.g. A/B testing). The ability to leverage real-time monitoring to enable actionable customizable guardrails is also important for continuous safeguarding that scales with their product.
The Challenge: Measuring the quality, relevancy, and consistency of LLM responses
Yoodli’s AI-driven approach to speech coaching relies on LLMs for two important things - understanding the practice content, or prompts, spoken by users and generating personalized feedback, as well as simulating responses and questions from an audience. After WhyLabs spoke to the Yoodli team, it was clear they needed monitoring for actionable insights about their LLM responses to track key performance metrics like quality, relevance, and consistency.
The lack of insight into vital metrics specific to their use case was creating a suboptimal user experience. For instance, they needed to know how many seconds it takes for a user to read the feedback provided by the LLM. However, gathering this data from various stacks of their system made it challenging to measure both typical and application-specific metrics in their LLMs.
Additionally, Yoodli relies on a third-party LLM, which they have little control over. They needed a way to ensure that the LLM continues to generate expected results and be alerted of anything unexpected. They also wanted to measure the effects of prompt updates from field data. Yoodli often experiments with various prompt versions, so it’s essential to evaluate and compare the quality and consistency of responses between versions to improve the customer experience. Using the WhyLabs AI Control Platform, in addition to accessing the industry’s best practices around observability, they also can leverage new security features, and reinforcement of customized guardrails across common dimensions of LLM vulnerability: bad actors, misuse, bad customer experience, misinformation/hallucinations, and costs.
Lastly, Yoodli required a solution that was simple to implement and manage. As a startup, they needed to prioritize their resources on projects and features that drive customer value. Yoodli needed robust monitoring, alerting, and guardrails which is where WhyLabs comes into the picture.
Solution
Using WhyLabs, the Yoodli team found immediate value in out-of-the-box metrics to visually compare quality metrics between segments or models, which helped improve the performance of their LLM application. By leveraging segments, Yoodli can analyze LLM metrics across different data cohorts to provide performance comparisons between experimental groups (e.g. prompt versioning, model versions, A/B testing) or across various use case categories (e.g. assistant styles, personas and speaking topics).
Yoodli leverages WhyLabs monitors to provide automated anomaly detection and alerting when LLM prompts and responses exhibit abnormal levels of toxicity, sentiment, and text quality markers such as lexicon count and readability.
WhyLabs’ open-source library, LangKit also allowed Yoodli to easily add their own custom metrics. They were able to combine LLM behavior metrics and their own application-specific metrics into a single profile, making it easier to visualize everything in a single place. This provides flexibility for Yoodli to generate new custom metrics as their application and monitoring requirements scale and change over time.
“WhyLabs provides a safety net for us that we didn’t have before. As a result, we can iterate on new experiments and prompts faster and ship new AI features quickly. We can do so confidently, knowing we have quantitative metrics to back up our decisions. Most importantly, this enables Yoodli to stay at the forefront of pushing AI innovation and deliver exceptional value to our customers.” said Varun Puri, CEO of Yoodli.
Why they chose to partner with WhyLabs
What impressed Yoodli the most was how easily WhyLabs integrated with their existing systems and how simple, quick, and easy the integration process was to follow. They were able to generate metrics and upload data into the WhyLabs Platform by utilizing sample code examples that required zero customization or complex integration steps. “WhyLabs stood out as an accessible and easily adaptable solution, even for companies with limited engineering resources,” said Varun. They implemented the solution themselves in just a few hours with the help of well-structured and easy-to-understand documentation and immediately found value in the out-of-the-box dashboards, monitors, and alerts.
Another key factor in Yoodli's decision to choose WhyLabs is the platform's robust monitoring and anomaly detection capabilities. Working with managed third-party LLM services can be challenging due to the limited visibility and control over these black box systems. “WhyLabs provides consistent monitoring of LLM behavior across many dimensions in a quantifiable way. This provides us with visibility and immediate notification if an LLM’s behavior has changed in any way, enabling us to implement reinforced guardrails and continuous improvements” said Varun.
LLM-backed applications are also vulnerable to a host of factors, such as bad actors that can gain access to Personally Identifiable Information (PII) or can rewire the model behavior with prompt injections. WhyLabs AI Control Platform LLM Security offerings enables teams to protect LLM applications against malicious prompts and customize guardrails to the responses.
Outcome
Using the Whylabs AI Control Platform, Yoodli is able to effectively perform end-to-end monitoring of their LLM-based application, identify unintentional side effects by leveraging anomaly detection, and ship higher-quality feedback and responses to their customers.