Now let’s say you also filter the sales record by sku (stock-keeping unit aka. barcode) in addition to sale_date and country. Creating a partition on sku will result in many partitions which is not ideal as it might result in uneven and smaller partitions.
Hadoop is not efficient in processing small volumes of data. There is a better way.
Read on to understand when each technique makes sense.