blog bg left
Back to Blog

How Observability Uncovers the Effects of ML Technical Debt

One of the most alarming aspects of machine learning is that many teams don’t yet have tools and processes to measure the negative effects of technical debt in their production systems. Many teams test their machine learning models offline but conduct little to no online evaluation after initial deployment. These teams are flying blind—running production systems with no insight into their ongoing performance. Observability is the first step in measuring how this technical debt has come to bear on your business. Otherwise, it is your customers that are first to notify you of your system’s faults.

Technical debt and performance degradation can be silent and insidious. Unlike in traditional software, small and valid changes in the data still can cause catastrophic failures for ML models. Changes in the distribution, or relative proportion, of data values can not only cripple your model but do so without triggering any of the typical DevOps alarms on service and data availability. This is in addition to the deployment and software issues that plague all production software, but is made worse because machine learning code is data-dependent and particularly hard to debug. Visibility into data and ML-specific technical debt requires approaches that are purpose built for datasets, their distributions and dependencies, as well as the unique characteristics of the machine learning models that rely on them.

“Visibility into data and ML-specific technical debt requires approaches that are purpose built for datasets, their distributions and dependencies, as well as the unique characteristics of the machine learning models that rely on them.”

With observability and monitoring, you can proactively detect and respond to changes in the model performance before your customers or stakeholders even notice an issue. While the accumulation of hidden technical debt like brittle data pipelines and inadequate data collection practices can cripple ML systems, observability into the dynamics of your data and models allows you to nurse your production systems back to health faster.

AI Observability with whylogs and WhyLabs Observatory

The WhyLabs team has collectively spent decades putting machine learning into production and battled massive amounts of ML technical debt. In that time, we repeatedly needed tools that can provide observability into the inner workings of our AI systems that are purpose-built for large datasets. Over the past two years, we have built an observability platform and open-source data logging library, whylogs. The logging library gathers metrics about the dataset’s distribution, missing values, distinct values, data type schema, and additional information that are contained in a dataset profile. These profiles can then be passed to WhyLabs Observatory to enable powerful monitoring and drift detection that allows insight into the growing effects of machine learning technical debt.

Profiling your data using whylogs for Python is as simple as installing with `pip install whylogs` and running the following script:

import pandas as pd
from whylogs import get_or_create_session

df = pd.read_csv("YOUR-DATASET.csv")	# Load a dataset in one of many ways

session = get_or_create_session()		# Get a whylogs session

with session.logger() as ylog:			# Run session in a context manager
	ylog.log_dataframe(df)		# Profile and log your data

Uncovering the sources of ML technical debt effects

Some of the most frequent model failures are caused not by natural data drift, but programmatic data transformations and changes in usage by the organization for data features. These cause undetected surges in missing values, schema changes, and data distribution changes downstream. Monitoring the inputs means monitoring statistical properties of each feature individually and monitoring the collective distribution of features.

As an example, imagine your team has deployed a machine learning model that makes predictions using user information added manually to your website. Zip code is a common and problematic feature in systems for its complexity and structure. Consider an update that collects a shortened five-digit US zip code for new customer accounts instead of the previously required nine-digit code. Not only would your model see entirely new values for this feature that the model was not trained on, your data pipeline may interpret the new values as five-digit integers instead of the longer strings. This will cause poor performance of the model for all new customers—something difficult to catch using performance alone. By monitoring the zip code feature for missing values and schema changes, we’re able to catch changes and save our model as soon as the changes are deployed.

Negative consequences of data and machine learning technical debt can arise from more than just changes in the input data. Changes in code, model parameters, random seeds, infrastructure, inputs, outputs, or performance metrics can all have deleterious effects on your system. This is why it’s important to use a solution that can be instrumented at multiple points in your team’s data and model pipeline and built specifically for machine learning and large datasets.

“It’s important to use a solution that can be instrumented at multiple points in your team’s data and model pipeline and built specifically for machine learning and large datasets.”

While monitoring model inputs and outputs can help to detect the effects of technical debt in real time, full observability of your data and model pipeline are needed to uncover the true root cause of an issue. The high dependency across machine learning pipelines also means that a single source can cause multiple issues. For example, a system that only monitors model outputs or performance may also detect the zip code issue we described above. But observability into the input data before and after feature transformation could help us to locate where in the pipeline the problem originates. Full AI observability might reveal the high proportion of unseen zip code values at inference time, schema change from string to integer, and importantly, the root cause: a change in character length in the raw input for zip code. Without the true root cause, we may retrain the model to temporarily fix the poor performance but the technical debt from the website change would have persisted.

Let us know how you uncover the effects of ML technical debt and how the WhyLabs team can help. The whylogs library is available on Github for both Python and Java with added wrappers for Spark. And for observability and monitoring of production systems, check out our always-free, fully self-serve Starter edition of WhyLabs AI Observatory.

How Observability Uncovers the Effects of ML Technical Debt” was originally published on the DCAI Resource Hub, the premier location for learning how to leverage DCAI best practices for improving ML model performance.``

Other posts

Choosing the Right Data Quality Monitoring Solution

In the second article in this series, we break down what to look for in a data quality monitoring solution, open source and Saas tools available, and how to decide on the best one for your organization.

Deploying and Monitoring Made Easy with TeachableHub and WhyLabs

Deploying a model into production and maintaining its performance can be harrowing for many Data Scientists, especially without specialized expertise and equipment. Fortunately, TeachableHub and WhyLabs make it easy to get models out of the sandbox and into a production-ready environment.

A Comprehensive Overview Of Data Quality Monitoring

In the first article in this series, we provide a detailed overview of why data quality monitoring is crucial for building successful data and machine learning systems and how to approach it.

WhyLabs Now Available in AWS Marketplace

AWS customers worldwide can now quickly deploy the WhyLabs AI Observatory to monitor, understand, and debug their machine learning models deployed in AWS.

Deploy your ML model with UbiOps and monitor it with WhyLabs

Machine learning models can only provide value for a business when they are brought out of the sandbox and into the real world... Fortunately, UbiOps and WhyLabs have partnered together to make deploying and monitoring machine learning models easy.

AI Observability for All

We’re excited to announce our new Starter edition: a free tier of our model monitoring solution that allows users to access all of the features of the WhyLabs AI observability platform. It is entirely self-service, meaning that users can sign up for an account and get started right away.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Get started for free