blog bg left
Back to Blog

WhyLabs' Data Geeks Unleashed

Hello WhyLabs followers! This month three members of the WhyLabs team are speaking at the Data and AI Summit. In this post we've included descriptions and links to all their talks.

Data and AI Summit 2021 (Virtual)


Alessya Visnjic, will be presenting on The Critical Missing Component in the Production ML Stack on May 26, 2021 03:50 pm PT:

Description:

The day the ML application is deployed to production and begins facing the real world is the best and the worst day in the life of the model builder. The joy of seeing accurate predictions is quickly overshadowed by a myriad of operational challenges. Debugging, troubleshooting & monitoring takes over the majority of their day, leaving little time for model building. In DevOps, software operations are taken to a level of an art. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software stability and robustness. In the ML world, operations are still largely a manual process that involves Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging. Traces and metrics are built on top of logs enabling monitoring and feedback loops. What does logging look like in an ML system?
In this talk we will demonstrate how to enable data logging for an AI application using MLflow in a matter of minutes. We will discuss how something so simple enables testing, monitoring and debugging in an AI application that handles TBs of data and runs in real-time. Attendees will leave the talk equipped with tools and best practices to supercharge MLOps in their team.

Leandro Almeida, will be presenting on Semantic Image Logging Using Approximate Statistics & MLflow on May 27, 2021 11:00 am PT:

Description:

As organizations launch complex multi-modal models into human-facing applications, data governance becomes both increasingly important, and difficult. Specifically, monitoring the underlying ML models for accuracy and reliability becomes a critical component of any data governance system. When complex data, such as image, text and video, is involved, monitoring model performance is particularly problematic given the lack of semantic information. In industries such as health care and automotive, fail-safes are needed for compliant performance and safety but access to validation data is in short supply, or in some cases, completely absent. However, to date, there have been no widely accessible approaches for monitoring semantic information in a performant manner.
In this talk, we will provide an overview of approximate statistical methods, how they can be used for monitoring, along with debugging data pipelines for detecting concept drift and out-of-distribution data in semantic-full data, such as images. We will walk through an open source library, whylogs, which combines Apache Spark and novel approaches to semantic data sketching. We will conclude with practical examples equipping ML practitioners with monitoring tools for computer vision, and semantic-full models.

Andy Dang, will be presenting on Re-imagine Data Monitoring with whylogs and Spark on May 27, 2021 05:00 pm PT:

Description:

In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data.
In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.

Past talks

You might be interested to checkout a couple of other talks that the WhyLabs team have given recently:

Bernease Herman was at csv,conf presenting on Static datasets aren't enough: where deployed systems differ from research on May 5, 2021 12:00pm PT. The video and slides are online.

Description:

We focus on static datasets in machine learning and AI training fails to translate to how these systems are being deployed in industry. As a result, data scientists and engineers aren't considering how these systems perform in changing, real world environments nor the feedback mechanisms and societal implications that these systems can cause. In the session, we will highlight existing tools that work with dynamic (and perhaps streaming) data. We will suggest some preliminary studies of activities and lessons that may bridge the gap in data science training for realistic data.

Alessya Visnjic, was at the Databricks Community Lightning Talks presenting on The Critical Missing Component in the Production ML Stack on May 20, 2021. The video starts at minute 16.


Other posts

AI Observability for All

We’re excited to announce our new Starter edition: a free tier of our model monitoring solution that allows users to access all of the features of the WhyLabs AI observability platform. It is entirely self-service, meaning that users can sign up for an account and get started right away.

Deploy your ML model with UbiOps and monitor it with WhyLabs

Machine learning models can only provide value for a business when they are brought out of the sandbox and into the real world... Fortunately, UbiOps and WhyLabs have partnered together to make deploying and monitoring machine learning models easy.

Observability in Production: Monitoring Data Drift with WhyLabs and Valohai

What works today might not work tomorrow. And when a model is in real-world use, serving the faulty predictions can lead to catastrophic consequences...

Why You Need ML Monitoring

Machine learning models are increasingly becoming key to businesses of all shapes and sizes, performing myriad functions... If a machine learning model is providing value to a business, it’s essential that the model remains performant.

Data Labeling Meets Data Monitoring with Superb AI and WhyLabs

Data quality is the key to a performant machine learning model. That’s why WhyLabs and Superb AI are on a mission to ensure that data scientists and machine learning engineers have access to tools designed specifically for their needs and workflows.

Running and Monitoring Distributed ML with Ray and whylogs

Running and monitoring distributed ML systems can be challenging. Fortunately, Ray makes parallelizing Python processes easy, and the open source whylogs enables users to monitor ML models in production, even if those models are running in a distributed environment.
pre footer decoration
pre footer decoration
pre footer decoration

Run AI With Certainty

Get started for free
loading...