Curated SQL – Page 509 – A Fine Slice Of SQL Server

Counting Employees by Period with DAX

Published 2022-05-24 by Kevin Feasel

I’m calling this article, “How many employees by period”. Staff come and go for different reasons. In some companies, the number of staff can change over time. The principles used in this article can also be used in other instances. There can be staff moving in and out of departments, on and off of projects, etc. The technique can also be used to work out how many staff were on leave, how many off sick, how many tickets were open in a support queue, or any other concept that has a start and end date in a transactional table.

Read on for Matt’s answer but be sure to check out the comments as there are some other good solutions in there.

Comments closed

The Move to General Database Platforms

Published 2022-05-23 by Kevin Feasel

Steve Jones muses on specialization in data platforms:

It’s been a decade-plus of the Not-Only-SQL (NoSQL) movement where a large variety of specialized database platforms have been developed and sold. It seems that there are so many different platforms for data stores that you can find one for whatever specialized type of data you are working with. However, is that what people are doing to store data in their applications?
I saw this piece on the return to the general-purpose database, postulating that a lot of the NoSQL database platforms have added additional capabilities that make them less specialized and more generalized. I’ve seen some of this, just as many relational platforms have added features that compete with one of the NoSQL classes of databases. The NoSQL datastores might be adding SQL-like features because some of these platforms are too specialized, and the vendors have decided they need to cover a slightly wider set of use cases.

I see three overlapping forces here in play. First, you have vendors looking at Total Addressable Market for their specific technology (document, key-value, graph, whatever) versus the size of the relational database market and they salivate about getting those general-purpose fat stacks of cash. That’s what Steve is getting at in the graf above.

I think the second force is that specialization is ultimately a sucker’s game when it comes to databases. By specializing in one area, you ultimately sacrifice others. A tool like Elasticsearch is outstanding as a document search engine and it is miserable as an aggregation engine (ask me how I know—it’s like every 6 months, another product team decides that this time, they’ll get stats aggregation with Elasticsearch to work well…and six months after people actually start to use the thing, they move all the data to someplace else that is adequately queryable). Similarly, document databases are excellent for populating details in an application but is not at all excellent at aggregation or arbitrary queries connecting data together. Specialization seems like a great idea until new requirements come in which require advanced reporting.

The third force is that these systems are independent and getting them to talk to each other typically involves writing a lot of ETL/ELT code or using additional third-party tools. To the extent that there are data virtualization platforms, they’re either excruciatingly slow (e.g., PolyBase) or expensive and out of date because they cache the data periodically. A corollary of the third force is that different platforms tend to use different languages and trying to remember which of the three or four different languages you need to use to access data in this case can be a bit painful. This is part of the reason Feasel’s Law exists.

The net result of all of this is that it seems you end up with the same piece of information in several separate places and build complicated systems to keep these separate systems aligned. Each system is (theoretically) optimized for a given use case but you end up with more and more people spending their time gluing together data from disparate systems, ensuring that data in disparate systems matches up, or moving data between disparate systems. If you need to do all of this, then sure, do it. But if there’s a single general-purpose platform which does all of this stuff 90% as well, a large number of companies and use cases will do just fine with the single tool. And that’s why general-purpose database platforms are still so popular and why I believe they will remain popular indefinitely.

The biggest exception I see is caching but that’s because it’s more a “fire-and-forget” data storage system. If you do it right, you don’t have any ETL/ELT to or from the cache and if cache dies, your system continues to work (albeit slower than with the cache). It’s also tied to a specific application and only exists temporarily, so data mismatches are (hopefully) transitory enough not to matter.

Comments closed

Object-Level Security in Power BI

Published 2022-05-23 by Kevin Feasel

Chris Webb checks out Object-Level Security:

If you have sensitive data in your Power BI dataset you may need to stop some users seeing the data in certain columns or measures. There is only one way to achieve this: you have to use Object Level Security (OLS) in your dataset. It’s not enough to exclude those measures or columns from your reports or to hide them, because there will always be ways for enterprising users to see data they shouldn’t be allowed to see. However the problem with OLS up to now is that it didn’t play nicely with Power BI reports and so you had to create multiple versions of the same report for different security roles. The good news is that there’s now a way to create one report connected to a dataset with OLS and have it display different columns and measures to users with different permissions.

And then watch as Chris combines Row-Level Security with Object-Level Security to make it nicer for users but probably a mess for maintainers.

Comments closed

Finding Articles in a SQL Server Publication

Published 2022-05-23 by Kevin Feasel

Kenneth Fisher disturbs the slumber of the forces of replication:

The other day I was asked to supply a list of all of the tables being replicated into a given database. Now, for those of you that aren’t aware, if I replicate a group of tables from database SourceDB into DestDB I can still have additional tables in DestDB that have nothing to do with the replication. So this wasn’t just a matter of getting a list of tables from the database.

Click through for queries which work for transactional replication as well as merge replication.

Comments closed

Forcing Color Scheme by Data Element

Published 2022-05-23 by Kevin Feasel

Reza Rad forces a particular color scheme in Power BI:

You can set the color in every visual in Power BI easily. You can also set the color of multiple visuals at the same time using Themes. However, what if you want to set the same color for the same data point? For example, You want the Gender Female to be always colored Orange in all the charts and visuals. In Power BI, as of now, you cannot set a data point color. However, there is an easy solution for that, which I explained in this article and video.

Click through for the answer. Generally I’d say something along the lines of “instead of doing this, just have one color and take advantage of cross-filtering to highlight the element people care about.” But if you do have a multi-measure categorical set with a small number of categories, color can be a differentiator and at least this helps you keep consistent colors across visuals.

Comments closed

Savepoints in Transactions

Published 2022-05-23 by Kevin Feasel

Kevin Wilkie continues a series on transactions in SQL Server:

All right, now that everyone’s back with us, we’ll talk more about everyone’s favorite – transactions. When they deal with transactions, most people only know how to begin one, then either commit it or roll it back. But there’s so much more you can do with a transaction!
This time I want to focus on savepoints for transactions. Yes, the same term you’ve been using in games for years can be used in the workplace!

I think I have actually made use of savepoints in production code…maybe twice? It always seems like whenever I might actually make use of one (rather than simply rolling it all back and starting over) that there’s some limitation which makes them not useful.

Comments closed

Memory Fractions in SQL Server

Published 2022-05-23 by Kevin Feasel

Hugo Kornelis explains the notion of memory fractions:

Some time ago a reader reached out to me with a request for help. He showed me a query and accompanying execution plan, and asked if I could help reduce (or, better yet, eliminate) the many hash spills that were killing his performance.
While helping him work through the plan, I was once more reminded of one of my pet peeves with execution plans: we get to see the requested memory for the plan (the Memory Grant and MemoryGrantInfo properties), which is of course based on the estimated total memory usage of operators that are active at the same time. We also get to see the actual memory used by each individual operator (in the Memory Usage property). But there is no way to see how much memory the optimizer estimates for each individual operator.

Read on for a detailed explanation.

Comments closed

Finding Indexing Metrics in Cosmos DB

Published 2022-05-20 by Kevin Feasel

Hasan Savran looks at the numbers:

You might need Composite Indexes to make your queries more efficient, Cosmos DB does not create any Composite Indexes for you. You need to figure out which properties should have composite indexes then you need to change the indexing policy file to create them.
Indexing Metrics comes to your help when you need help with indexing policy. It tells you which indexes the current query uses and it gives you hints about what other indexes you should create to make the query work faster/cheaper. Like many other features of Cosmos DB, you need to write code by using SDK to see Indexing Metrics. The following example shows how to enable Indexing Metrics for your queries.

Click through for a code sample which shows how to collect index metrics.

Comments closed

Web Accessibility and Shiny

Published 2022-05-20 by Kevin Feasel

Jamie Owen has a two-parter. First up, why web accessibility standards are important:

An accessible website is more than putting content online. Making a website accessible means ensuring that it can be used by as many people as possible. Accessibility standards such as the Web Content Accessibility Guidelines (WCAG) help to standardise the way in which a website can interact with assistive technologies. Allowing developers to incorporate instructions into their web applications which can be interpreted by technologies such as screen readers helps to maintain a consistent user experience for all.

Second, how Shiny apps tend to stack up:

The great thing about {shiny} is that it allows data practitioners a relatively simple, quick approach to providing an intuitive user interface to their R code via a web application. So effective is {shiny} at this job that it can be done with little to no traditional web development knowledge on the part of the developer. {shiny} and associated packages provide collections of R functions that return HTML, CSS and JavaScript which is then shipped to a browser. The variety of packages giving trivial access to styled front end components and widgets is already large and constantly growing. What this means is that R programmers can achieve a huge amount in the way of building complex, visually attractive web applications without needing to care very much about the underlying generated content that is interpreted by the browser.

As a quick spoiler, not so well. Read on for the full report.

Comments closed

Cost Savings with Azure Data Factory

Published 2022-05-20 by Kevin Feasel

Koen Verbeeck maximizes the savings:

As you might’ve noticed, pricing in ADF is not the same as it was in SSIS for example. In SSIS, you pay your SQL Server license and you’re done (well, and you buy a server to run it on). It doesn’t matter what you do with SSIS, the cost is the same. If you run 1 package or 1000 packages, there’s no difference except in your electricity bill. However, in ADF you pay more if you use it more. You pay for each action you do, you pay for each activity you use and for how long things are running. There are a couple of guidelines you can follow to try to minimize costs:

Read on for those guidelines and some specific helpful items.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts