Jordan Witcombe provides an explanation:
The VertiPaq engine cleverly uses columnar storage for efficient querying and processing. It employs multiple compression techniques, including Run-Length Encoding (RLE) and Dictionary Encoding, to minimise storage space. Through finding optimal sort orders and value encoding, it achieves maximum space efficiency and performance. VertiPaq also utilises ‘In-Memory Column Store’ for fast query performance, ‘Predicate Pushdown’ to eliminate unnecessary data at query time, and ‘Block Decompression’ to only decompress relevant data blocks, making it a powerhouse for data management and retrieval.
Now, because of these ingenious tricks, we wave goodbye to traditional file formats like JSON or CSV. Instead, all data stored within the managed area of Fabric and OneLake uses either Parquet or Delta. It’s time to embrace these efficient, high-performing formats that bring the best out of VertiPaq’s compressive power. Let’s explore these further in the next section.
Read on for some comparisons in file size between Fabric and Databricks, as well as how they perform in Power BI.
Comments closed