blog bg left
Back to Blog

WhyLabs' Data Geeks Unleashed

Hello WhyLabs followers! This month three members of the WhyLabs team are speaking at the Data and AI Summit. In this post we've included descriptions and links to all their talks.

Data and AI Summit 2021 (Virtual)


Alessya Visnjic, will be presenting on The Critical Missing Component in the Production ML Stack on May 26, 2021 03:50 pm PT:

Description:

The day the ML application is deployed to production and begins facing the real world is the best and the worst day in the life of the model builder. The joy of seeing accurate predictions is quickly overshadowed by a myriad of operational challenges. Debugging, troubleshooting & monitoring takes over the majority of their day, leaving little time for model building. In DevOps, software operations are taken to a level of an art. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software stability and robustness. In the ML world, operations are still largely a manual process that involves Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging. Traces and metrics are built on top of logs enabling monitoring and feedback loops. What does logging look like in an ML system?
In this talk we will demonstrate how to enable data logging for an AI application using MLflow in a matter of minutes. We will discuss how something so simple enables testing, monitoring and debugging in an AI application that handles TBs of data and runs in real-time. Attendees will leave the talk equipped with tools and best practices to supercharge MLOps in their team.

Leandro Almeida, will be presenting on Semantic Image Logging Using Approximate Statistics & MLflow on May 27, 2021 11:00 am PT:

Description:

As organizations launch complex multi-modal models into human-facing applications, data governance becomes both increasingly important, and difficult. Specifically, monitoring the underlying ML models for accuracy and reliability becomes a critical component of any data governance system. When complex data, such as image, text and video, is involved, monitoring model performance is particularly problematic given the lack of semantic information. In industries such as health care and automotive, fail-safes are needed for compliant performance and safety but access to validation data is in short supply, or in some cases, completely absent. However, to date, there have been no widely accessible approaches for monitoring semantic information in a performant manner.
In this talk, we will provide an overview of approximate statistical methods, how they can be used for monitoring, along with debugging data pipelines for detecting concept drift and out-of-distribution data in semantic-full data, such as images. We will walk through an open source library, whylogs, which combines Apache Spark and novel approaches to semantic data sketching. We will conclude with practical examples equipping ML practitioners with monitoring tools for computer vision, and semantic-full models.

Andy Dang, will be presenting on Re-imagine Data Monitoring with whylogs and Spark on May 27, 2021 05:00 pm PT:

Description:

In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.

Past talks

You might be interested to checkout a couple of other talks that the WhyLabs team have given recently:

Bernease Herman was at csv,conf presenting on Static datasets aren't enough: where deployed systems differ from research on May 5, 2021 12:00pm PT. The video and slides are online.

Description:

We focus on static datasets in machine learning and AI training fails to translate to how these systems are being deployed in industry. As a result, data scientists and engineers aren't considering how these systems perform in changing, real world environments nor the feedback mechanisms and societal implications that these systems can cause. In the session, we will highlight existing tools that work with dynamic (and perhaps streaming) data. We will suggest some preliminary studies of activities and lessons that may bridge the gap in data science training for realistic data.

Alessya Visnjic, was at the Databricks Community Lightning Talks presenting on The Critical Missing Component in the Production ML Stack on May 20, 2021. The video starts at minute 16.


Other posts

Re-imagine Data Monitoring with whylogs and Apache Spark

An overview of how the whylogs integration with Apache Spark achieves large scale data profiling, and how users can apply this integration into existing data and ML pipelines.

ML Monitoring in Under 5 Minutes

A quick guide to using whylogs and WhyLabs to monitor common issues with your ML models to surface data drift, concept drift, data quality, and performance issues.

AIShield and WhyLabs: Threat Detection and Monitoring for AI

The seamless integration of AIShield’s security insights on WhyLabs AI observability platform delivers comprehensive insights into ML workloads and brings security hardening to AI-powered enterprises.

Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask

Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation with Fugue on Spark, Ray or Dask.

Monitoring Image Data with whylogs v1

When operating computer vision systems, data quality and data drift issues always pose the risk of model performance degradation. Whylabs provides a simple yet highly customizable solution for maintaining observability into data to detect issues and take action sooner.

WhyLabs Private Beta: Real-time, No-code, Cloud Storage Data Profiling

We’re excited to announce our Private Beta release for a no-code integration option for WhyLabs, allowing users to bypass the need to integrate whylogs into their data pipeline.

Data and ML Monitoring is Easier with whylogs v1.1

The release of whylogs v1.1 brings many features to the whylogs data logging API, making it even easier to monitor your data and ML models!

Model Monitoring for Financial Fraud Classification

Model monitoring is helping the financial services industry avoid huge losses caused by performance degradation in their fraud transaction models.

Robust & Responsible AI Newsletter - Issue #3

Every quarter we send out a roundup of the hottest MLOps and Data-Centric AI news including industry highlights, what’s brewing at WhyLabs, and more.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Book a demo
loading...