Partitioning – Curated SQL

Data partitioning is the magic wand that divides your massive dataset into smaller, organized subsets called partitions. These partitions are based on specific criteria, like date ranges, customer segments, or product categories.
It’s like organizing your overflowing closet by color, season, or type of clothing. Each section becomes easier to browse and manage, making life (and data analysis) much easier.

Read on for a few varieties of partitioning and how they could improve your data estate. There’s no guarantee that partitioning will definitely improve performance—and in SQL Server’s case, the partitioning feature often does not improve performance at all because that isn’t its intent—but this is a good read to get an idea of what strategies are available.

Comments closed

Primer on Indexing and Partitioning in Postgres

Published 2023-11-20 by Kevin Feasel

Salman Ahmed gives us a 10,000 foot view of two topics:

When it comes to managing large and complex databases in PostgreSQL, an important decision you’ll face is how to optimize your data storage and retrieval strategies. Two common techniques for improving database performance and manageability are indexing and partitioning in PostgreSQL.

Read on for a quick overview of each topic, including the variety of index types and partitioning strategies available.

Comments closed

Finding Partitioned Tables in SQL Server

Published 2023-09-28 by Kevin Feasel

Andrea Allred has a script for us:

I recently needed to know which tables in my database were partitioned. I tried a bunch of queries and some got incredibly complex. I finally found one that I like:

Click through for the script and for the assumption Andrea makes (which is a reasonable one).

Comments closed

ALTER TABLE SWITCH and Errors 4907, 4908, and 4912

Published 2023-08-18 by Kevin Feasel

Eitan Blumin works out some problems:

When it comes to managing tables and indexes in SQL Server, the ALTER TABLE SWITCH statement is a powerful tool for “moving” data swiftly between tables. However, this convenience can sometimes be met with frustrating roadblocks, such as errors 4907 and 4908.
These errors may be confusing about their underlying cause, particularly when the source and target tables have identical partitions, including in non-clustered indexes.

Read on to see what these error messages mean and how you can correct them.

Comments closed

Thoughts on Partitioning in Postgres

Published 2023-08-07 by Kevin Feasel

Ryan Booz splits things out:

For 20+ years of database and application development, time-series data has always been at the heart of the products I work with. Inevitably, as the quantity of data grew over time, management became more difficult and query performance suffered. Over the years, the primary method for managing this growth in data effectively would be to partition it. The problem is, until recently, partitioning wasn’t easy to setup in most OLTP databases like PostgreSQL or SQL Server.
Fortunately, PostgreSQL has significantly improved its ability to partition large data tables over the last 6 years, starting with PostgreSQL 10.

Read on for Ryan’s recommendations around partitioning and a few thoughts on sharding.

Comments closed

An Overview of Partitioning and Sharding in Postgres

Published 2023-08-07 by Kevin Feasel

Michael Christofides defines terms:

It has been possible to do partitioning in PostgreSQL for quite a while — splitting what is logically one large table into smaller physical tables. Since version 10, a huge leap was made with the introduction of declarative partitioning, and more improvements have come every year since.
Sharding is a different story — splitting what is logically one large database into smaller physical databases. The primary tool for this in the PostgreSQL ecosystem is the Citus extension. But you can also handle the sharding logic at the application level, as recent posts from the likes of Notion and Figma have described. Somewhat confusingly, some forms of sharding are sometimes referred to as vertical partitioning, including by the team at Figma.

Read on for a few thoughts on when to perform each action and what the costs and benefits are.

Comments closed

Automatic Partition Maintenance in Power BI

Published 2022-11-29 by Kevin Feasel

Shabnam Watson answers an attendee question:

During one of my presentations on Incremental Refresh (IR) in Power BI, someone asked what happens during a Power BI automatic partition maintenance window when Power BI has an opportunity to merge smaller partitions into larger ones. Does Power BI use the data that is already imported into Power BI for the smaller partitions and combine it into a bigger one or does it re-read the data for those smaller partitions again. For example, if a dataset has an IR policy to refresh the last 1 day, and it has read data for all the days in a previous month, one day for each, when the new month arrives, it has an opportunity to merge the smaller day partitions into a month partition for the previous month. Does it re-read the previous month’s data from the source again or does it use what it already has in its memory?

Read on for the answer.

Comments closed

Partitioning Data in Power BI

Published 2022-11-28 by Kevin Feasel

Paul Turley continues a series on working with large amounts of data in Power BI:

You don’t have to have massive tables to benefit from partitioning. Even tables with a few hundred thousand records can benefit from partitioning, to improve data refresh performance and to detect source data changes. There is little maintenance overhead, so the benefits usually outweigh the cost, in terms of effort and management.

Click through for Paul’s thoughts on the topic.

Comments closed

Replicated Tables in Dedicated SQL Pools

Published 2022-10-21 by Kevin Feasel

Pedro Martinez explains the idea behind replicated tables in Azure Synapse Analytics dedicated SQL pools:

If you have ever used Azure Synapse Analytics dedicated SQL pool you would know there are multiple table types to choose from, for your workload. You might ask yourself, “when can I use Replicated table type and how I can efficiently use them”?
This blog is going to talk in detail about replicated table type, when to use and what are best practices for its usage. But before that, let’s start by understanding the different table types:

I’ve seen replicated tables get overused, so check out Pedro’s advice on how not to get burned with them.

Comments closed

Auto Partitioning Recommendations for Oracle

Published 2022-10-18 by Kevin Feasel

Brendan Tierney checks out some recommendations:

In a previous blog post I gave an overview of the DBMS_AUTO_PARTITION package in Oracle Autonomous Database. This looked at how you can get started and to setup Auto Partitioning and to allow it to automatically implement partitioning.
This might not be something the DBAs will want to happen for lots of different reasons. An alternative is to use DBMS_AUTO_PARTITION to make recommendations for tables where partitioning will have a performance improvement. The DBA can inspect these recommendations and decide which of these to implement.

Read on to see how you can run the recommender, as well as what a recommendation looks like.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Category: Partitioning

An Overview of Data Partitioning Strategies

Primer on Indexing and Partitioning in Postgres

Finding Partitioned Tables in SQL Server

ALTER TABLE SWITCH and Errors 4907, 4908, and 4912

Thoughts on Partitioning in Postgres

An Overview of Partitioning and Sharding in Postgres

Automatic Partition Maintenance in Power BI

Partitioning Data in Power BI

Replicated Tables in Dedicated SQL Pools

Auto Partitioning Recommendations for Oracle