Semab Tariq deletes a significant amount of data:
In today’s blog, we will discuss another crucial aspect of time-series data management: massive delete operations.
As your data grows over time, older records often lose their relevance but continue to occupy valuable disk space, potentially increasing storage costs and might degrade the performance if not managed well.
Let’s walk through some strategies to clean up or downsample aged data in TimescaleDB, helping you maintain a lean, efficient, and cost-effective database.
The “or downsample” is huge, by the way: as a simple example, suppose you collect one record every millisecond, or 1000 per second. Say that we have a date+time and a few floating point numbers that add up to 40 bytes per record. If we have a year of data at that grain, we have 40 bytes/record * 1000 records/second * 3600 seconds/hour * 24 hours/day * 365.25 days/year, or 1,262,304,000,000 bytes/year. That’s ~1.15 terabytes of data per year, assuming no compression (which there actually is, but whatever). By contrast, if you keep millisecond-level data for a week, second-level for 3 weeks, and minute-level for the remaining year, you have:
40 bytes/record * 1000 records/second * 3600 seconds/hour * 24 hours/day * 7 days/week * 1 week = 22.53 gigabytes
40 bytes/record * 1 record/second * 3600 seconds/hour * 24 hours/day * 7 days/week * 3 weeks = 69 megabytes
40 bytes/record * 1 record/minute * 60 minutes/hour * 24 hours/day * 337.25 days = 18.5 megabytes
And for most cases, we only need the lowest level of granularity for a relatively short amount of time. After that, we typically care more about how the current data looks versus older data, for the purposes of trending.