Production-Ready Models with Databricks and WhyLabs
- WhyLabs
- Integrations
- AI Observability
- Whylogs
- News
Jun 23, 2023
Every AI practitioner turns to Databricks when they need to build massively scalable AI and data pipelines. But once in production, these AI pipelines are prone to silent failures because of missing values, changing schema, and broken feature transformations. Troubleshooting these failures in terabyte-sized pipelines is like looking for a needle in a needlestack, it takes weeks and often causes significant financial losses because of degraded customer experience. That’s where WhyLabs enables AI and data monitoring that works seamlessly in the Databricks environment. WhyLabs AI Observatory is built to monitor any scale of data, in a distributed environment, without the need to sample or move data around.
Unlocking capabilities that are almost too good to be true
WhyLabs has partnered with Databricks to enable a unique integration that makes it possible to compute all key telemetry data necessary for AI monitoring directly in Apache Spark. The telemetry data is collected in parallel with the AI pipeline using the WhyLabs-MLFlow integration. With this integration, every data science and machine learning team can monitor data quality at the speed of 1M rows per second, while enjoying a fully distributable computation.
With WhyLabs, teams can answer questions about data of any size:
- How do I know if the data my model depends on is always high quality? A spike in null values or a sudden change in schema can cause your model to produce unexpected results, monitoring your data continually is important for a production-ready model.
- How can I identify data drift before it causes my model to degrade? Data drift can be slow or abrupt and cause your model to produce less accurate results. Without a method of continually comparing datasets against themselves over time, this can be a very manual process to identify drift in your data.
- How can I gauge the performance of my models over time and know if it’s gotten worse? Understanding whether a model is healthy involves more than what application or infrastructure tools give visibility into. You need visibility into the predictions as well as ground truth to understand how your model is performing over time.
- How can I identify bias in my predictions and correlate it to specific segments of my dataset? Bias in your predictions can be challenging to identify unless you’re monitoring your predictions at scale looking for segments that are performing poorly.
One solution for observability across the entire Databricks ecosystem
WhyLabs AI Observatory enables out-of-the-box monitoring for your data pipelines and models running in Databricks across all data types: structured, semi-structured, unstructured, and streaming. The Lakehouse architecture is incredibly flexible, covering use cases across business intelligence, data streaming, machine learning, and generative AI. With WhyLabs, there is an integration that supports every one of these use cases. We will briefly introduce each of these use cases, but if you are looking for advice on the best place to plug in WhyLabs for your unique setup, ping us on the community Slack channel.
Big data and streaming data pipelines with Spark and Delta Live Tables
Data is the lifeblood of any AI and BI application, so ensuring that this data is high quality and observable is critical for the health of these applications. Across our customers, the most common performance degradation in AI models is caused by data quality bugs: missing values, changes in distribution, or introduction of new categories. WhyLabs provides an easy to integrate and cost-effective solution for monitoring all key data quality metrics, alerting team members about issues, and helping with root cause analysis. WhyLabs Integrates with Delta Live Tables and Spark, making it easy to set up observability for any data pipeline.
from pyspark.sql import SparkSession
from pyspark import SparkFiles
from whylogs.api.pyspark.experimental import collect_dataset_profile_view
import whylogs as why
from whylogs.api.writer.whylabs import WhyLabsWriter
spark = SparkSession.builder.appName('whylogs-testing').getOrCreate()
arrow_config_key = "spark.sql.execution.arrow.pyspark.enabled"
spark.conf.set(arrow_config_key, "true")
data_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
spark.sparkContext.addFile(data_url)
spark_dataframe = spark.read.option("delimiter", ";") \
.option("inferSchema", "true") \
.csv(SparkFiles.get("winequality-red.csv"), header=True)
dataset_profile_view = collect_dataset_profile_view(input_df=spark_dataframe)
writer = WhyLabsWriter()
writer.write(file=profile.view())
ML model experimentation and serving with MLflow
ML models power the most important applications across enterprises, everything from marketing to core product experiences. The MLflow toolchain makes it simple to track experiments during model development as well as to serve models for inference in production. Once the model is in production, monitoring its performance and the health of model inputs is crucial to ensure ROI. Without monitoring, ML models fail silently because of data drift, biases, and data quality issues. WhyLabs integrates with MLflow serving to build a training data baseline and to assess model training data for data quality issues. Once in production, WhyLabs integrates with MLflow serving to continuously monitor model performance and ensure that the model inputs are not drifting to cause training-serving skew.
AI applications built on Dolly
Databricks’ Dolly is an instruction-following large language model trained on the Databricks machine learning platform. Large Language Models (LLMs) like Dolly are transforming the landscape of AI applications from genuinely helpful chatbots to nearly autonomous code generation tools. But this incredible technology doesn’t come without deployment challenges, as these models are prone to hallucinations, biases, as well as privacy and security loopholes. WhyLabs makes it easy to monitor and safeguard LLMs hosted on Databricks utilizing our industry standard for LLM monitoring, LangKit. LangKit detects and prevents malicious prompts, toxicity, hallucinations, and jailbreak attempts. Here is an example of our seamless integration of LangKit with LangChain for building AI applications.
from langkit import llm_metrics
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
from langchain.callbacks.whylabs_callback import WhyLabsCallbackHandler
from training.generate import InstructionTextGenerationPipeline, load_model_tokenizer_for_generate
model, tokenizer = load_model_tokenizer_for_generate(“databricks/dolly-v2-3b”)
prompt = PromptTemplate(input_variables=["instruction"], template="{instruction}")
whylabs = WhyLabsCallbackHandler.from_params(org_id=‘<your-org>’, api_key=‘<your-key>’, dataset_id=‘<your-dataset>’)
hf_pipeline = HuggingFacePipeline(
pipeline=InstructionTextGenerationPipeline(
model=model, tokenizer=tokenizer, return_full_text=True, task="text-generation"),
callbacks=[whylabs]
)
llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
response = llm_chain.run(instruction="Explain to me the difference between nuclear fission and fusion.")
Privacy-preserving and cost-effective integration
WhyLabs integrates in a unique privacy-preserving way. The integration does not require users to move data outside of the existing environment - all data processing happens within the Databricks Lakehouse and does not involve data duplication or sampling. This integration approach reduces security risks significantly and ensures that data is handled with the highest level of privacy and confidentiality. The data processing is done using highly optimized telemetry agents (whylogs), which enables telemetry collection in a fully distributed manner. Using telemetry agents is very cost-effective, as the computation of telemetry adds minimal overhead to existing pipelines (see benchmarks). The WhyLabs AI telemetry agents have been battle tested at massive data scale organizations like Lyft, StitchFix, Square, and Yahoo Japan.
Take the guesswork out of AI/ML and data health today
Enabling WhyLabs on any Databricks pipeline is quick and simple. Get started with this easy example and see the power of WhyLabs observability in a matter of minutes. If you’d like to talk about your specific use case, reach out over Slack or schedule time with our solution architects - we’d be happy to help!
Other posts
Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs
Dec 10, 2024
- AI risk management
- AI Observability
- AI security
- NIST RMF implementation
- AI compliance
- AI risk mitigation
Best Practicies for Monitoring and Securing RAG Systems in Production
Oct 8, 2024
- Retrival-Augmented Generation (RAG)
- LLM Security
- Generative AI
- ML Monitoring
- LangKit
How to Evaluate and Improve RAG Applications for Safe Production Deployment
Jul 17, 2024
- AI Observability
- LLMs
- LLM Security
- LangKit
- RAG
- Open Source
WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control
Jun 2, 2024
- AI Observability
- Generative AI
- Integrations
- LLM Security
- LLMs
- Partnerships
OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety
May 21, 2024
- LLMs
- LLM Security
- Generative AI
7 Ways to Evaluate and Monitor LLMs
May 13, 2024
- LLMs
- Generative AI
How to Distinguish User Behavior and Data Drift in LLMs
May 7, 2024
- LLMs
- Generative AI