AI Observability is Dead, Long Live AI Observability! Introducing WhyLabs AI Control Center for Generative and Predictive AI
- WhyLabs
- News
- Generative AI
Apr 24, 2024
Today, we are launching the new iteration of WhyLabs. This iteration is based on a unique insight: AI Observability Platforms are insufficient in the world of Generative AI. We came to this realization thanks to our customers, who have highlighted to us that passive observability capabilities are insufficient. You cannot afford a 5-minute delay in learning that a jailbreak has impacted the LLM application. You need the ability to interfere in real-time and to steer the application away from the security threat.
After months in private beta, we are releasing the new WhyLabs AI Control Platform to general availability. The Platform makes real-time security threat prevention possible for any Generative AI (GenAI) application. This capability is already making a huge difference in enabling teams to take their GenAI projects from prototype to production. One of the success stories is Yoodli, an incredible product that helps any person become a great public speaker using GenAI technology. Check out our case study and how Yoodli accelerated the continuous improvement of their LLM-based application using WhyLabs, resulting in significantly faster experimentation, impactful new features, and overall higher quality of responses delivered to their end-users.
"Yoodli is building for a future where everyone can be a better communicator through our state-of-the-art Generative AI speech coaching application,” explained Varun Puri, CEO of Yoodli. “WhyLabs AI Control Platform provides us with an accessible and easily adaptable solution that we can trust. We are really excited about the new capabilities that enable us to execute AI control across five critical dimensions for our communication application: protection against bad actors, misuse, bad customer experience, hallucinations, and costs."
What is unique about GenAI that makes AI observability obsolete?
The emergence of foundation AI models has revolutionized the accessibility and efficiency of Artificial Intelligence (AI). Because these models are pre-trained on vast amounts of data they possess a deep understanding of various domains and are broadly applicable across multi-modal, complex tasks. Their versatility allows developers to fine-tune or adapt foundation models for specific tasks quickly, bypassing the long cycle of training the model from scratch.
This is both a blessing and a curse of foundation models. The models are so powerful that they can power any GenAI application, from being your customer service agent to giving your customer legal advice to sharing your customer information with anyone. This power introduces entirely new categories of risks: adversarial attacks, data privacy breaches, harmful topics, and factually incorrect claims.
These risks pose serious challenges to moving GenAI applications from prototype to production. To tackle these, AI teams require tools that let them control and steer GenAI applications in real-time. But even that is insufficient. GenAI applications evolve rapidly, with prompt engineering, RLHF, and RAG. In addition to real-time guardrails, teams need a way to understand behavior changes over time. But even that is insufficient. While RAG architectures help tackle hallucinations, they add to the complexity of troubleshooting issues arising with GenAI applications. As such, teams need a way to analyze complex traces over time in order to debug issues. These are pain points we focused on solving with the WhyLabs AI Control Center.
How does the WhyLabs AI Control Center help tackle these risks?
WhyLabs AI Control Center unifies three pillars of AI operations in one platform. We combine the most advanced and up-to-date techniques for AI application security, observability, and optimization.
By bringing together all of these key components, we enable our customers to harness the power of GenAI applications with precision and control. Today, no other platform has been able to connect all of these capabilities and make them computationally efficient to support high-throughput production workloads.
As we continue building our platform to serve the world's best AI teams and enterprises, there are a few design principles that guide us and that we continue adhering to in the platform:
- Privacy comes first: WhyLabs is architected with the highest degree of privacy and data security. For all capabilities available on the AI Control Center, no raw data ever leads the customer VPC. In fact, raw data doesn’t even need to move to another environment for telemetry extraction. We extract all telemetry in the same environment, without data duplication.
- Massive enterprise scale: All aspects of the WhyLabs Platform are architected for handling massive inference scale. From integrations, to telemetry calculation algorithms, to monitoring configurations, to UI. The WhyLabs Platform can handle hundreds of millions of inferences per day (thousands of application TPS) in real-time, even for models with thousands of features.
- Frictionless onboarding and ownership: As the only SaaS on the market approved for highly regulated industries, we focus on making your onboarding onto the platform as simple as something an undergraduate intern can do over an hour. With SaaS deployment, you do not need to worry about infrastructure or maintenance, ever.
Let’s explore each of the three pillars of the AI Control Platform!
WhyLabs: Observe
All of our customers' favorite capabilities of the WhyLabs Observability Platform are now available in WhyLabs: Observe and all existing customers have access to the expanded capabilities. In addition, these capabilities are now extended to LLM applications. Whether you are deploying batch inference pipelines or a live prediction service, WhyLabs provides infrastructure agnostic and real-time anomaly detection and monitoring of drift, data quality issues, and performance degradation. With WhyLabs, teams have decreased the Time to Resolution of AI issues by 10x.
Organizations leveraging the WhyLabs: Observe for AI applications see the following outcomes:
- Transparency and observability at any scale:
- Cost-effective solution for any scale of AI footprint, with support for models with 1000s of Input Features and applications with >1000 Transactions per Second.
- Improvement of the duration and severity of AI failures, due to:
- Feature and concept drift detection
- Data quality issue detection
- Bias detection
- GenAI security metrics drift
- GenAI performance metrics drift
- Model performance degradation detection
- Outlier detection, in features and predictions
- Improvement of time to resolution of AI failures and degradations, due to:
- Purpose-built visualizations and workflows designed for resolving drift, data quality, and performance degradations
- Explainability-powered root cause analysis
- Automation of re-training pipelines
- Circuit breakers in the pipeline when issues are detected
- Minimization of manual operations, due to:
- Automatic model onboarding
- Smart, zero-config monitoring setup
- Templatable configurations to drive standards across teams
- Automated alerting and notifications directly into the team’s workflow
- Customization of dashboards and reports, for technical and non-technical audiences
The WhyLabs AI Control Platform offers a unique architecture that allows customers to switch on observability for 100% of the data, never sampling because sampling distorts the distributions and causes a high rate of false alarms. Customers configure monitoring by connecting WhyLabs directly to training and inference data, in batch or real time. This integration is cost-effective and privacy-preserving, making it the best choice in high inference volume AI applications for organizations such as Healthcare and FinTech.
WhyLabs: Secure
After speaking with hundreds of security teams across enterprises and late stage startups, we identified one clear need for LLM applications - an ability to interfere in real time and steer the application away from “undesirable behavior”. WhyLabs: Secure focuses on making it easy for teams to switch on the guardrail and create the necessary feedback loops for continuous model improvement.
Organizations leveraging the WhyLabs: Secure for AI applications see the following outcomes:
- Absolute data privacy due to:
- All processing of raw data happens locally. The solution does not rely on third-party providers and does not require sending raw data outside of the immediate infrastructure.
- Reduction in the duration and severity of GenAI security events, due to real-time:
- Prevention of bad actors by detecting prompt injections, jailbreaks, refusals, or sensitive information disclosure.
- Prevention of misuse by detecting inappropriate topics, costly prompts, or sensitive information leakage.
- Detection and prevention of poor customer experience by detecting toxic, low sentiment, or inadequate responses.
- Detection and prevention of untruthful or false responses by detecting bad hallucinations, overreliance, and out-of-context responses.
- Reduction in the time to resolution of the GenAI security events, due to:
- Powerful analytics of full AI application traces.
- Dedicated dashboards for tracking security and performance trends.
- Drill-down analysis from trends to application traces.
- Compliance and governance approvals necessary to move from prototype to production, due to:
- WhyLabs: Secure meets the OWASP Top 10 for LLMs and MITRE ATLASTM standard for LLM security.
- The security team has full control of the GenAI application through control of the guardrail sensitivities, system messages, and notification workflows.
If you or your team are currently looking at open source projects which are offering a range of guardrails, tracing, and RAG debugging capabilities for LLM applications, take a look at WhyLabs: Secure. With one simple integration, our customers enable:
- A real-time guardrail to detect harmful prompts and responses in under 300ms across all critical security rules.
- A response router to steer the application towards safe behavior with customizable callbacks and system messages.
- An AI-specialized logger that captures rich metrics, traces, and metadata (with no-code integrations into OpenAI, Bedrock, etc).
- An AI-specialized data capture to organize raw prompt and response data in a manner that maximizes privacy and allows teams to use this data in future optimization.
- A powerful monitoring engine that aggregates metrics to surface behavior changes over time and alert users about them.
- A rich analytics platform that enables users to visualize guardrail decisions, trends in security and performance, and debug events down to individual traces.
Experience the demo today. Schedule time with our engineering team to onboard your GenAI application for a 14-day free trial.
WhyLabs: Optimize
The WhyLabs AI Control Platform provides feedback tools for improving any aspect of your AI system. Observability and security generate insights necessary to improve the application experience. WhyLabs takes this a step further. Using WhyLabs: Optimize, you can create datasets for re-training predictive models or for setting up Reinforcement Learning from Human Feedback (RLHF) in generative models.
By using insights from WhyLabs: Observe and WhyLabs: Secure, our customers enable the following optimizations across:
Optimized performance of WhyLabs: Secure capabilities:
- Customization of the rule sets in the guardrail with examples and proprietary Red Team datasets
- Continuous tuning of the rule sets using datasets created from blocking and flagging events
Optimized performance of predictive models with:
- Automated retraining when the model accuracy drops below a desired level, using customizable webhooks
- Automated circuit breaks in batch pipelines when data quality issues are detected, preventing the release of bad model predictions
- Improvements to input features via analysis of feature quality volatility and information entropy of features
Optimized customer experience in GenAI applications with:
- Analytics of RAG traces captured from application interactions which were marked with thumbs down or received negative feedback from the end user.
- Analytics of the traces that flagged model refusals to ensure the best customer experience.
- Customization system messages and callbacks to steer the GenAI application toward appropriate behaviors.
By systematically tracking model performance, data quality, traces, security events, and metadata - you are building feedback loops for continuous improvement across your AI model lifecycle. Get started with WhyLabs: Observe or WhyLabs: Secure to begin collecting the most valuable data about your AI applications.
How do I get started with WhyLabs AI Control Center?
The WhyLabs Control Center is available now, you can get started by signing up for a free account. All of the capabilities can be experienced today in our Demo Organization on the platform.
With our free plan, you can start using all WhyLabs: Observe capabilities on your own predictive and generative applications today.
For WhyLabs: Secure, you can request a 14-day free trial here.
If you are looking for specific WhyLabs: Optimize capabilities, schedule a call with our team. We would love to show you what the future looks like!
As always, we are a team of AI practitioners developing for AI practitioners. We love feedback and feature requests. Jump on our community Slack channel to chat with our engineering team directly.
Other posts
Best Practicies for Monitoring and Securing RAG Systems in Production
Oct 8, 2024
- Retrival-Augmented Generation (RAG)
- LLM Security
- Generative AI
- ML Monitoring
- LangKit
How to Evaluate and Improve RAG Applications for Safe Production Deployment
Jul 17, 2024
- AI Observability
- LLMs
- LLM Security
- LangKit
- RAG
- Open Source
WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control
Jun 2, 2024
- AI Observability
- Generative AI
- Integrations
- LLM Security
- LLMs
- Partnerships
OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety
May 21, 2024
- LLMs
- LLM Security
- Generative AI
7 Ways to Evaluate and Monitor LLMs
May 13, 2024
- LLMs
- Generative AI
How to Distinguish User Behavior and Data Drift in LLMs
May 7, 2024
- LLMs
- Generative AI