Press "Enter" to skip to content

Bluebox: An Evolving Sample Database for PostgreSQL

Ryan Booz has a sample database:

Sure, there are datasets everywhere. Kaggle currently lists over 600,000 public datasets, but most of them are static CSV files that you load once and never touch again. Great for a one-time analysis, not so great for learning how a real database behaves over time. The Postgres Wiki lists a few dozen sample databases, too. And shoot, your shiny new AI coding buddy can help you create one if you want to put the time in.

The problem with most of these datasets is that they’re primarily static. If you’re lucky, some of the datasets might produce new data dumps once a month to keep things “current”. But the problem is that you can’t really practice query tuning if your data never changes. You can’t explore vacuum behavior when there are no updates. You can’t test monitoring tools when nothing is happening.

Click through for more information on Bluebox, as well as a Docker container containing several helpful tools and processes to make this data evolve over time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.