Press "Enter" to skip to content

Day: May 23, 2022

The Move to General Database Platforms

Steve Jones muses on specialization in data platforms:

It’s been a decade-plus of the Not-Only-SQL (NoSQL) movement where a large variety of specialized database platforms have been developed and sold. It seems that there are so many different platforms for data stores that you can find one for whatever specialized type of data you are working with. However, is that what people are doing to store data in their applications?

I saw this piece on the return to the general-purpose database, postulating that a lot of the NoSQL database platforms have added additional capabilities that make them less specialized and more generalized. I’ve seen some of this, just as many relational platforms have added features that compete with one of the NoSQL classes of databases. The NoSQL datastores might be adding SQL-like features because some of these platforms are too specialized, and the vendors have decided they need to cover a slightly wider set of use cases.

I see three overlapping forces here in play. First, you have vendors looking at Total Addressable Market for their specific technology (document, key-value, graph, whatever) versus the size of the relational database market and they salivate about getting those general-purpose fat stacks of cash. That’s what Steve is getting at in the graf above.

I think the second force is that specialization is ultimately a sucker’s game when it comes to databases. By specializing in one area, you ultimately sacrifice others. A tool like Elasticsearch is outstanding as a document search engine and it is miserable as an aggregation engine (ask me how I know—it’s like every 6 months, another product team decides that this time, they’ll get stats aggregation with Elasticsearch to work well…and six months after people actually start to use the thing, they move all the data to someplace else that is adequately queryable). Similarly, document databases are excellent for populating details in an application but is not at all excellent at aggregation or arbitrary queries connecting data together. Specialization seems like a great idea until new requirements come in which require advanced reporting.

The third force is that these systems are independent and getting them to talk to each other typically involves writing a lot of ETL/ELT code or using additional third-party tools. To the extent that there are data virtualization platforms, they’re either excruciatingly slow (e.g., PolyBase) or expensive and out of date because they cache the data periodically. A corollary of the third force is that different platforms tend to use different languages and trying to remember which of the three or four different languages you need to use to access data in this case can be a bit painful. This is part of the reason Feasel’s Law exists.

The net result of all of this is that it seems you end up with the same piece of information in several separate places and build complicated systems to keep these separate systems aligned. Each system is (theoretically) optimized for a given use case but you end up with more and more people spending their time gluing together data from disparate systems, ensuring that data in disparate systems matches up, or moving data between disparate systems. If you need to do all of this, then sure, do it. But if there’s a single general-purpose platform which does all of this stuff 90% as well, a large number of companies and use cases will do just fine with the single tool. And that’s why general-purpose database platforms are still so popular and why I believe they will remain popular indefinitely.

The biggest exception I see is caching but that’s because it’s more a “fire-and-forget” data storage system. If you do it right, you don’t have any ETL/ELT to or from the cache and if cache dies, your system continues to work (albeit slower than with the cache). It’s also tied to a specific application and only exists temporarily, so data mismatches are (hopefully) transitory enough not to matter.

Comments closed

Object-Level Security in Power BI

Chris Webb checks out Object-Level Security:

If you have sensitive data in your Power BI dataset you may need to stop some users seeing the data in certain columns or measures. There is only one way to achieve this: you have to use Object Level Security (OLS) in your dataset. It’s not enough to exclude those measures or columns from your reports or to hide them, because there will always be ways for enterprising users to see data they shouldn’t be allowed to see. However the problem with OLS up to now is that it didn’t play nicely with Power BI reports and so you had to create multiple versions of the same report for different security roles. The good news is that there’s now a way to create one report connected to a dataset with OLS and have it display different columns and measures to users with different permissions.

And then watch as Chris combines Row-Level Security with Object-Level Security to make it nicer for users but probably a mess for maintainers.

Comments closed

Finding Articles in a SQL Server Publication

Kenneth Fisher disturbs the slumber of the forces of replication:

The other day I was asked to supply a list of all of the tables being replicated into a given database. Now, for those of you that aren’t aware, if I replicate a group of tables from database SourceDB into DestDB I can still have additional tables in DestDB that have nothing to do with the replication. So this wasn’t just a matter of getting a list of tables from the database.

Click through for queries which work for transactional replication as well as merge replication.

Comments closed

Forcing Color Scheme by Data Element

Reza Rad forces a particular color scheme in Power BI:

You can set the color in every visual in Power BI easily. You can also set the color of multiple visuals at the same time using Themes. However, what if you want to set the same color for the same data point? For example, You want the Gender Female to be always colored Orange in all the charts and visuals. In Power BI, as of now, you cannot set a data point color. However, there is an easy solution for that, which I explained in this article and video.

Click through for the answer. Generally I’d say something along the lines of “instead of doing this, just have one color and take advantage of cross-filtering to highlight the element people care about.” But if you do have a multi-measure categorical set with a small number of categories, color can be a differentiator and at least this helps you keep consistent colors across visuals.

Comments closed

Savepoints in Transactions

Kevin Wilkie continues a series on transactions in SQL Server:

All right, now that everyone’s back with us, we’ll talk more about everyone’s favorite – transactions. When they deal with transactions, most people only know how to begin one, then either commit it or roll it back. But there’s so much more you can do with a transaction!

This time I want to focus on savepoints for transactions. Yes, the same term you’ve been using in games for years can be used in the workplace!

I think I have actually made use of savepoints in production code…maybe twice? It always seems like whenever I might actually make use of one (rather than simply rolling it all back and starting over) that there’s some limitation which makes them not useful.

Comments closed

Memory Fractions in SQL Server

Hugo Kornelis explains the notion of memory fractions:

Some time ago a reader reached out to me with a request for help. He showed me a query and accompanying execution plan, and asked if I could help reduce (or, better yet, eliminate) the many hash spills that were killing his performance.

While helping him work through the plan, I was once more reminded of one of my pet peeves with execution plans: we get to see the requested memory for the plan (the Memory Grant and MemoryGrantInfo properties), which is of course based on the estimated total memory usage of operators that are active at the same time. We also get to see the actual memory used by each individual operator (in the Memory Usage property). But there is no way to see how much memory the optimizer estimates for each individual operator.

Read on for a detailed explanation.

Comments closed