Small Changes for Big SQLite Performance Increases
- Whylogs
- WhyLabs
- ML Monitoring
Jul 19, 2022
A behind-the-scenes look at how the WhyLabs engineering team improved SQLite performance to make monitoring data and machine learning models faster and easier for whylogs users.
Auto Commits, Long Lived Connections, and Vacuuming.
I recently had to dive deep into SQLite to tune performance on a container that we maintain at WhyLabs to make it easy for users to generate whylogs profiles on their data without having to directly consume whylogs in their data pipelines. I was able to increase transactions per second (TPS) and p95 latency by orders of magnitude through some small, obscure changes to the SQLite code that will probably be relevant to other workloads and use cases as well.
First, some context: WhyLabs built whylogs, an open source library, whose purpose is to profile data. Given something like a Pandas dataframe, we generate what we call Dataset Profiles, which contain various aggregated info such as count, sum, distribution stats, etc. These profiles can be merged after they are generated and they can be used to monitor any of the properties that they contain, so you can get alerted to important changes in your data, especially in real time streaming environments.One of the guarantees is making it easy for people to run whylogs on their data, regardless of the details around their specific tech stack. One of the most general solutions to this that we offer is a container-based REST service that has an API that takes in data and runs whylogs for you, periodically sending the profiles that it manages to us for monitoring. The container is written in Kotlin (JVM) mostly because coroutines are great.
In addition to taking in data and sending profiles to the WhyLabs SaaS Platform, the container is responsible for the lifecycle management of profiles. As data comes in, either the current time or a data point's timestamp is used to determine which profile to merge the new data point into, which allows you to generate profiles for any time granularity in the data. While the container is building up data there is a risk of data loss if something goes wrong before we can upload it.
This is where SQLite finally comes in. We use SQLite to store the profiles staged on disk in the container so that crashes don't result in data loss. The first naive version of this worked fine, but it was incredibly slow. Our SQLite operations below are abstracted behind a queue interface, which was nice for what we were doing. The important part is the SQLite code. Don't copy and paste this snippet, this is the bad one.
class SqliteQueueWriteLayer<T>() {
private val logger = LoggerFactory.getLogger(javaClass)
init {
val createTable = "CREATE TABLE IF NOT EXISTS items ( value BLOB );"
db { prepareStatement("vacuum;").execute() }
db { prepareStatement("PRAGMA journal_mode=WAL;").execute() }
db {
logger.debug("Created sqlite db")
prepareStatement(createTable).execute()
}
}
private fun db(block: Connection.() -> Unit) {
val url = "jdbc:sqlite:/tmp/foo-queue-v2.sqlite"
DriverManager.getConnection(url).use {
block(it)
}
}
override suspend fun push(t: List<T>) {
val values = t.map { serializer.serialize(it) }
val insert = "INSERT INTO items (value) VALUES (?)"
db {
values.forEach { value ->
prepareStatement(insert).apply {
setBytes(1, value)
executeUpdate()
}
}
}
}
override suspend fun peek(n: Int): List<T> {
val items = mutableListOf<T>()
val query = "SELECT value FROM items ORDER BY ROWID ASC LIMIT $n;"
db {
val results = prepareStatement(query).executeQuery()
while (results.next()) {
val serializedItem = results.getBytes(1)
val item = serializer.deserialize(serializedItem)
items.add(item)
}
}
return items
}
override suspend fun pop(n: Int) {
// This query is nicer and probably performs better but it requires
// sqlite to be built with SQLITE_ENABLE_UPDATE_DELETE_LIMIT flag.
// val query = "DELETE FROM items ORDER BY ROWID ASC LIMIT $n;"
val query = "DELETE FROM items WHERE ROWID IN ( SELECT ROWID FROM items ORDER BY ROWID ASC LIMIT $n);"
db { prepareStatement(query).executeUpdate() }
db { prepareStatement("vacuum;").execute() }
}
override suspend fun size(): Int {
var size: Int? = null
val query = "SELECT count(1) from items;"
db {
val results = prepareStatement(query).executeQuery()
while (results.next()) {
size = results.getInt(1)
}
}
return size ?: throw IllegalStateException("Couldn't get the size")
}
}
This basically created a SQLite connection for every database operation performed. We performed multiple per request to the container: one to look up the current profile if it exists, and another to write the new state of the profile after we added the current request's data to it. We just picked some arbitrary hardware to get a sense of relative performance differences. Benchmarking the SQLite implementation against an in memory version showed that we went from thousands of TPS to maybe 100, which wasn't great. But we did get the resilience we were after.
Problems
It turns out that our naive SQLite abstraction had a few small issues that caused a lot of performance problems.
Vacuuming
I had originally added a vacuum statement alongside the query that would delete items. I thought it would be the safe thing to do since, without this, the db file would only ever grow. I hadn't observed anything that made me think it would grow to unreasonable sizes so this was probably a premature optimization.
Creating new database connections
This is a lot more expensive than I thought. The first version would establish a connection and throw it away for every transaction. The container would end up spending most of its time creating connections when tons of small requests would come in, which is the primary use case – people sending real time streaming data to the container for logging.
Auto committing
Auto commits were slowing down my batch writes. In part because I wasn't using the actual batch API of course, so executes would trigger a lot of little writes.
Solutions
I opted to omit vacuums entirely. This means that db file sizes will only increase, but given that this is a container with a dedicated workload it seems silly to expect that the growth (if it exists) would hurt anything. This wasn't a huge performance gain in tests but the system is simpler to reason about and now I'll never have to debug vacuuming if I get reports about intermittent lag from users.
The most significant benefit came from reusing a single database connection for the life of the container. This could be a weird thing to do for other database systems that have remote connections but the risk seemed low in the container given that the SQLite database file is on the same machine as the container process and concurrency issues are being addressed mostly outside of SQLite by serializing operations (with Kotlin channels/actors). It's unclear if the database connection is more vulnerable in a modern container environment where the storage might be network mounted so I just threw in a connection check that creates a new one if something has gone wrong.
The updated abstraction also handles disabling autocommits and committing at the end of blocks so that bulk transactions don't mistakenly end up triggering multiple commits. I extend from this in the classes that do SQLite stuff now.
abstract class SqliteManager : AutoCloseable {
/**
* The connection that is used for all of the database operations. Using a single
* connection is by far the most performant for local sqlite databases but the rest
* of the application has to be architected in such a way that you don't run into
* any concurrency issues (which this one is intended to do).
*/
private var connection: Connection? = null
/**
* JDBC string to pass into [DriverManager.getConnection].
*/
abstract val databaseUrl: String
init {
Runtime.getRuntime().addShutdownHook(Thread { connection?.close() })
}
override fun close() {
connection?.close()
}
/**
* Get a hold of [Connection] and initialize the [connection] if its uninitialized
* or its closed. It should ideally never close but the check is there as a precaution
* for some weird issue that results in it closing.
*/
private fun db(block: Connection.() -> Unit) {
if (connection?.isClosed == true || connection == null) {
connection = DriverManager.getConnection(databaseUrl)
}
// We'll throw NPE here so we know if something went wrong resetting the connectin.
// Should only happen because of a bad refactor or something.
connection!!.let {
it.autoCommit = false
block(it)
}
}
/**
* Get a hold of a [Connection] to execute queries. If you're going to
* do any write operations then you should use [tx] instead.
*/
fun query(block: Connection.() -> Unit) {
db {
block(this)
}
}
/**
* Util function for enabling WAL mode. Make sure to create the database first
* by executing some SQL statement before this.
*/
fun enableWAL() {
DriverManager.getConnection(databaseUrl).use { con ->
con.createStatement().use { it.executeUpdate("PRAGMA journal_mode=WAL;") }
}
}
fun vacuum() {
DriverManager.getConnection(databaseUrl).use { con ->
con.createStatement().use { it.executeUpdate("vacuum;") }
}
}
/**
* Get a [Connection] within a transaction. This will just commit for you and
* rollback in a catch block.
*/
fun tx(block: Connection.() -> Unit) {
db {
try {
block(this)
commit()
} catch (t: Throwable) {
rollback()
throw t
}
}
}
}
There are a lot of options for benchmarking. Originally I used ab for everything but it tends to fall over when you're doing tests with concurrent users. I prefer to use hey for quick tests now and k6 for things that require anything more than a canned request and curl. The exact values here aren't important since I didn't use commodity hardware for the benchmarks. It's really the relative values between the new and old code that matter. The benchmark sends requests as fast as possible to the container over a 2 minute period.
The p95 request time, which k6 gives you after a test, would worsen as you added more concurrent users to the container, going over 100ms at 8 users on this hardware. The newer code also performs worse with more concurrent users, of course, but not to the same extent. Its roughly 5 times faster to use the batch API, without auto commits, without recreating connections. One cool thing about these numbers is that the single user use case has a 3ms p95. That means that its fast enough for anyone that wants to use the whylogs container to profile their data to synchronously call the container from their data pipeline without having to do a ton of async code to make sure they don't delay whatever else they're doing.
The TPS of the old code was pretty consistently bad, never really getting much above 150tps. The newer code performs similarly for various counts of concurrent users with ~6x the TPS.
I also tried to change the concurrency mode of SQLite to see how it affected the performance but I either didn't actually get it to change or it didn't have a large impact. It's actually still unclear to me how people are supposed to change the concurrency mode at all when using JDBC. Some guides say it's a compile time flag for SQLite and some snippets claim to be changing it when creating JDBC connections. Either way, I couldn't see a difference. It could be because requests are already serialized anyway.
Remaining issues
There is still one problem that I haven't been able to figure out, that I'm not completely sure is related to SQLite.
The disk utilization skyrockets under heavy sustained loads. The burst throughput isn't affected in the same way. If anyone knows what might be going on there then definitely reach out to me. I have this issue up on Stackoverflow as well.
Data logging made easy
At WhyLabs we’re committed to making performance improvements that make it easy and fast for users to monitor their data & machine learning models. Our open-source library, whylogs can detect data drift, perform data validation, and quickly visualize key summary statistics. Check out our documentation to learn more.
Other posts
Best Practicies for Monitoring and Securing RAG Systems in Production
Oct 8, 2024
- Retrival-Augmented Generation (RAG)
- LLM Security
- Generative AI
- ML Monitoring
- LangKit
How to Evaluate and Improve RAG Applications for Safe Production Deployment
Jul 17, 2024
- AI Observability
- LLMs
- LLM Security
- LangKit
- RAG
- Open Source
WhyLabs Integrates with NVIDIA NIM to Deliver GenAI Applications with Security and Control
Jun 2, 2024
- AI Observability
- Generative AI
- Integrations
- LLM Security
- LLMs
- Partnerships
OWASP Top 10 Essential Tips for Securing LLMs: Guide to Improved LLM Safety
May 21, 2024
- LLMs
- LLM Security
- Generative AI
7 Ways to Evaluate and Monitor LLMs
May 13, 2024
- LLMs
- Generative AI
How to Distinguish User Behavior and Data Drift in LLMs
May 7, 2024
- LLMs
- Generative AI