Where Columnar Databases Struggle

Teo Lachev makes a good point regarding columnar databases:

A large company uses the SAP HANA ERP system. Users requires real-time access to transactional data. To avoid performance degradation, SLT replication (trigger-based change data capture) replicates data to another SAP HANA system that is used solely for reporting. The problem is that the more detailed the report gets and the more columns it has, the slower it gets and SAP HANA throws out of memory exceptions.

SAP HANA is an in-memory columnar database like Tabular. So, it stores data in columns, not rows. Columnar databases are primarily designed for analytical reports which typically have a few columns (sales by customer, product, date), but can potentially aggregate large datasets. As the reporting grain lowers and more columns are added (order number, order line item, customer name, phone number, etc.), a columnar database has to cross-join more and more columns. This is not efficient and performance quickly degrades irrespective that storage is fast. SSAS Tabular and Power BI are no different. SAP HANA complicates the issue further by preventing direct access to tables and requiring “analytical” views that join tables and potentially nest other views.

Read the whole thing.  Teo has a great point:  there are trade-offs between different data platform technologies, and choosing the right one is important.

Related Posts

Errors Updating Stats on Columnstore Indexes

Max Vernon walks us through some problems trying to update statistics on columnstore indexes: The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window. The error message claims that UPDATE STATISTICS can only be used on a […]

Read More

Clustered Columnstore Index Memory Timeouts

Joe Obbish takes a deep look at a clustered columnstore index insertion scenario: Why should we care about memory grant timeouts for CCI insert queries? Simply put, lots of bad things can happen when those queries can time out, both for serial and for parallel inserts. For serial insert queries, I’ve observed deadlocks, extremely poor […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031