WhyLabs AI Control Center (also known as the WhyLabs Platform) is now an open source project!

Anthony Naddeo

Jul 19, 2022

Back to Blog

Small Changes for Big SQLite Performance Increases

Whylogs
WhyLabs
ML Monitoring

Anthony Naddeo

Jul 19, 2022

A behind-the-scenes look at how the WhyLabs engineering team improved SQLite performance to make monitoring data and machine learning models faster and easier for whylogs users.

Auto Commits, Long Lived Connections, and Vacuuming.

I recently had to dive deep into SQLite to tune performance on a container that we maintain at WhyLabs to make it easy for users to generate whylogs profiles on their data without having to directly consume whylogs in their data pipelines. I was able to increase transactions per second (TPS) and p95 latency by orders of magnitude through some small, obscure changes to the SQLite code that will probably be relevant to other workloads and use cases as well.

First, some context: WhyLabs built whylogs, an open source library, whose purpose is to profile data. Given something like a Pandas dataframe, we generate what we call Dataset Profiles, which contain various aggregated info such as count, sum, distribution stats, etc. These profiles can be merged after they are generated and they can be used to monitor any of the properties that they contain, so you can get alerted to important changes in your data, especially in real time streaming environments.One of the guarantees is making it easy for people to run whylogs on their data, regardless of the details around their specific tech stack. One of the most general solutions to this that we offer is a container-based REST service that has an API that takes in data and runs whylogs for you, periodically sending the profiles that it manages to us for monitoring. The container is written in Kotlin (JVM) mostly because coroutines are great.

In addition to taking in data and sending profiles to the WhyLabs SaaS Platform, the container is responsible for the lifecycle management of profiles. As data comes in, either the current time or a data point's timestamp is used to determine which profile to merge the new data point into, which allows you to generate profiles for any time granularity in the data. While the container is building up data there is a risk of data loss if something goes wrong before we can upload it.

This is where SQLite finally comes in. We use SQLite to store the profiles staged on disk in the container so that crashes don't result in data loss. The first naive version of this worked fine, but it was incredibly slow. Our SQLite operations below are abstracted behind a queue interface, which was nice for what we were doing. The important part is the SQLite code. Don't copy and paste this snippet, this is the bad one.

class SqliteQueueWriteLayer<T>() {
    private val logger = LoggerFactory.getLogger(javaClass)

    init {
        val createTable = "CREATE TABLE IF NOT EXISTS items ( value BLOB );"
        db { prepareStatement("vacuum;").execute() }
        db { prepareStatement("PRAGMA journal_mode=WAL;").execute() }
        db {
            logger.debug("Created sqlite db")
            prepareStatement(createTable).execute()
        }
    }

    private fun db(block: Connection.() -> Unit) {
        val url = "jdbc:sqlite:/tmp/foo-queue-v2.sqlite"
        DriverManager.getConnection(url).use {
            block(it)
        }
    }

    override suspend fun push(t: List<T>) {
        val values = t.map { serializer.serialize(it) }
        val insert = "INSERT INTO items (value) VALUES (?)"

        db {
            values.forEach { value ->
                prepareStatement(insert).apply {
                    setBytes(1, value)
                    executeUpdate()
                }
            }
        }
    }

    override suspend fun peek(n: Int): List<T> {
        val items = mutableListOf<T>()
        val query = "SELECT value FROM items ORDER BY ROWID ASC LIMIT $n;"

        db {
            val results = prepareStatement(query).executeQuery()
            while (results.next()) {
                val serializedItem = results.getBytes(1)
                val item = serializer.deserialize(serializedItem)
                items.add(item)
            }
        }

        return items
    }

    override suspend fun pop(n: Int) {
        // This query is nicer and probably performs better but it requires 
        // sqlite to be built with  SQLITE_ENABLE_UPDATE_DELETE_LIMIT flag.
        // val query = "DELETE FROM items ORDER BY ROWID ASC LIMIT $n;"
        val query = "DELETE FROM items WHERE ROWID IN ( SELECT ROWID FROM items ORDER BY ROWID ASC LIMIT $n);"
        db { prepareStatement(query).executeUpdate() }
        db { prepareStatement("vacuum;").execute() }
    }

    override suspend fun size(): Int {
        var size: Int? = null
        val query = "SELECT count(1) from items;"
        db {
            val results = prepareStatement(query).executeQuery()

            while (results.next()) {
                size = results.getInt(1)
            }
        }

        return size ?: throw IllegalStateException("Couldn't get the size")
    }
}

This basically created a SQLite connection for every database operation performed. We performed multiple per request to the container: one to look up the current profile if it exists, and another to write the new state of the profile after we added the current request's data to it. We just picked some arbitrary hardware to get a sense of relative performance differences. Benchmarking the SQLite implementation against an in memory version showed that we went from thousands of TPS to maybe 100, which wasn't great. But we did get the resilience we were after.

Problems

It turns out that our naive SQLite abstraction had a few small issues that caused a lot of performance problems.

Vacuuming

I had originally added a vacuum statement alongside the query that would delete items. I thought it would be the safe thing to do since, without this, the db file would only ever grow. I hadn't observed anything that made me think it would grow to unreasonable sizes so this was probably a premature optimization.

Creating new database connections

This is a lot more expensive than I thought. The first version would establish a connection and throw it away for every transaction. The container would end up spending most of its time creating connections when tons of small requests would come in, which is the primary use case – people sending real time streaming data to the container for logging.

Auto committing

Auto commits were slowing down my batch writes. In part because I wasn't using the actual batch API of course, so executes would trigger a lot of little writes.

Solutions

I opted to omit vacuums entirely. This means that db file sizes will only increase, but given that this is a container with a dedicated workload it seems silly to expect that the growth (if it exists) would hurt anything. This wasn't a huge performance gain in tests but the system is simpler to reason about and now I'll never have to debug vacuuming if I get reports about intermittent lag from users.

The most significant benefit came from reusing a single database connection for the life of the container. This could be a weird thing to do for other database systems that have remote connections but the risk seemed low in the container given that the SQLite database file is on the same machine as the container process and concurrency issues are being addressed mostly outside of SQLite by serializing operations (with Kotlin channels/actors). It's unclear if the database connection is more vulnerable in a modern container environment where the storage might be network mounted so I just threw in a connection check that creates a new one if something has gone wrong.

The updated abstraction also handles disabling autocommits and committing at the end of blocks so that bulk transactions don't mistakenly end up triggering multiple commits. I extend from this in the classes that do SQLite stuff now.

abstract class SqliteManager : AutoCloseable {
    /**
     * The connection that is used for all of the database operations. Using a single
     * connection is by far the most performant for local sqlite databases but the rest
     * of the application has to be architected in such a way that you don't run into
     * any concurrency issues (which this one is intended to do).
     */
    private var connection: Connection? = null

    /**
     * JDBC string to pass into [DriverManager.getConnection].
     */
    abstract val databaseUrl: String

    init {
        Runtime.getRuntime().addShutdownHook(Thread { connection?.close() })
    }

    override fun close() {
        connection?.close()
    }

    /**
     * Get a hold of [Connection] and initialize the [connection] if its uninitialized
     * or its closed. It should ideally never close but the check is there as a precaution
     * for some weird issue that results in it closing.
     */
    private fun db(block: Connection.() -> Unit) {
        if (connection?.isClosed == true || connection == null) {
            connection = DriverManager.getConnection(databaseUrl)
        }

        // We'll throw NPE here so we know if something went wrong resetting the connectin.
        // Should only happen because of a bad refactor or something.
        connection!!.let {
            it.autoCommit = false
            block(it)
        }
    }

    /**
     * Get a hold of a [Connection] to execute queries. If you're going to
     * do any write operations then you should use [tx] instead.
     */
    fun query(block: Connection.() -> Unit) {
        db {
            block(this)
        }
    }

    /**
     * Util function for enabling WAL mode. Make sure to create the database first
     * by executing some SQL statement before this.
     */
    fun enableWAL() {
        DriverManager.getConnection(databaseUrl).use { con ->
            con.createStatement().use { it.executeUpdate("PRAGMA journal_mode=WAL;") }
        }
    }

    fun vacuum() {
        DriverManager.getConnection(databaseUrl).use { con ->
            con.createStatement().use { it.executeUpdate("vacuum;") }
        }
    }

    /**
     * Get a [Connection] within a transaction. This will just commit for you and
     * rollback in a catch block.
     */
    fun tx(block: Connection.() -> Unit) {
        db {
            try {
                block(this)
                commit()
            } catch (t: Throwable) {
                rollback()
                throw t
            }
        }
    }
}

There are a lot of options for benchmarking. Originally I used ab for everything but it tends to fall over when you're doing tests with concurrent users. I prefer to use hey for quick tests now and k6 for things that require anything more than a canned request and curl. The exact values here aren't important since I didn't use commodity hardware for the benchmarks. It's really the relative values between the new and old code that matter. The benchmark sends requests as fast as possible to the container over a 2 minute period.

The p95 request time, which k6 gives you after a test, would worsen as you added more concurrent users to the container, going over 100ms at 8 users on this hardware. The newer code also performs worse with more concurrent users, of course, but not to the same extent. Its roughly 5 times faster to use the batch API, without auto commits, without recreating connections. One cool thing about these numbers is that the single user use case has a 3ms p95. That means that its fast enough for anyone that wants to use the whylogs container to profile their data to synchronously call the container from their data pipeline without having to do a ton of async code to make sure they don't delay whatever else they're doing.

The TPS of the old code was pretty consistently bad, never really getting much above 150tps. The newer code performs similarly for various counts of concurrent users with ~6x the TPS.

I also tried to change the concurrency mode of SQLite to see how it affected the performance but I either didn't actually get it to change or it didn't have a large impact. It's actually still unclear to me how people are supposed to change the concurrency mode at all when using JDBC. Some guides say it's a compile time flag for SQLite and some snippets claim to be changing it when creating JDBC connections. Either way, I couldn't see a difference. It could be because requests are already serialized anyway.

Remaining issues

There is still one problem that I haven't been able to figure out, that I'm not completely sure is related to SQLite.

The disk utilization skyrockets under heavy sustained loads. The burst throughput isn't affected in the same way. If anyone knows what might be going on there then definitely reach out to me. I have this issue up on Stackoverflow as well.

Data logging made easy

At WhyLabs we’re committed to making performance improvements that make it easy and fast for users to monitor their data & machine learning models. Our open-source library, whylogs can detect data drift, perform data validation, and quickly visualize key summary statistics. Check out our documentation to learn more.

Anthony Naddeo

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Rich Young

Dec 10, 2024

Learn how the NIST AI Risk Management Framework (RMF) guides AI security and governance and discover how WhyLabs guardrails can help implement and manage AI risks effectively.

Read post

AI risk management
AI Observability
AI security
NIST RMF implementation
AI compliance
AI risk mitigation

Best Practicies for Monitoring and Securing RAG Systems in Production

Rich Young

Oct 8, 2024

Retrieval-augmented generation (RAG) systems combine advanced retrieval techniques with large language models (LLMs) to improve the responses they generate...

Read post

Retrival-Augmented Generation (RAG)
LLM Security
Generative AI
ML Monitoring
LangKit

How to Evaluate and Improve RAG Applications for Safe Production Deployment

Rich Young

Jul 17, 2024

Learn how to evaluate and improve RAG applications using LangKit and WhyLabs AI Control Center. Develop secure and reliable RAG applications.

Read post

AI Observability
LLMs
LLM Security
LangKit
RAG
Open Source

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

WhyLabs Team

Jun 2, 2024

With WhyLabs and NVIDIA NIM, enterprises can accelerate GenAI application deployment and help ensure the safety of end-user experiences WhyLabs has been on a mission to empower enterprises with tools that ensure safe and responsible AI adoption. With its integration with NVIDIA NIM inference microservices, WhyLabs is helping make responsible AI adoption more accessible. Customers can now maintain better security and control of GenAI applications with self-hosted deployment of the most powerfu

Read post

AI Observability
Generative AI
Integrations
LLM Security
LLMs
Partnerships

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

Alessya Visnjic

May 21, 2024

Discover strategies for safeguarding your large language models (LLMs). Learn how to protect your AI technologies effectively based on OWASP's top 10 security tips.

Read post

LLMs
LLM Security
Generative AI

7 Ways to Evaluate and Monitor LLMs

WhyLabs Team

May 13, 2024

Learn about 7 techniques for evaluating & monitoring LLMs, including LLM-as-a-Judge, ML-model-as-a-Judge, and embedding-as-a-source. Improve your understanding of LLMs with these strategies.

Read post

LLMs
Generative AI

How to Distinguish User Behavior and Data Drift in LLMs

Bernease Herman

May 7, 2024

Large Language Models (LLMs) rarely provide consistent responses for the same prompts over time. In this blog we’ll demonstrate how identify and monitor data changes using a few common scenarios.

Read post

LLMs
Generative AI

Run AI with Certainty

Book a demo

Small Changes for Big SQLite Performance Increases

A behind-the-scenes look at how the WhyLabs engineering team improved SQLite performance to make monitoring data and machine learning models faster and easier for whylogs users.

Auto Commits, Long Lived Connections, and Vacuuming.

Problems

Vacuuming

Creating new database connections

Auto committing

Solutions

Remaining issues

Data logging made easy

Other posts

Understanding and Implementing the NIST AI Risk Management Framework (RMF) with WhyLabs

Best Practicies for Monitoring and Securing RAG Systems in Production

How to Evaluate and Improve RAG Applications for Safe Production Deployment

WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control

OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety

7 Ways to Evaluate and Monitor LLMs

How to Distinguish User Behavior and Data Drift in LLMs

Run AI with Certainty

About

Resources

whylogs

WhyLabs