I recently had a big challenge with one of my customers where due to the sheer volume of data and network connectivity speed I was hitting the 5-hour limit for processing of data into my premium per user dataset.
My solution was to change the partitions from monthly to daily. And then once I have all the daily partitions merge them back into monthly partitions.
The challenge I had was I now had to process daily partitions from 2019-01-01 to 2021-11-30. This was a LOT of partitions and I had to find a way to automate the processing of partitions.
Not only that, but I had to ensure that I did not overload the source system too!
Read on to see what Gilbert did to solve this problem.