Curated SQL – Page 334 – A Fine Slice Of SQL Server

Automated Partitioned Table Management

Published 2022-05-04 by Kevin Feasel

Eitan Blumin automates creation and deletion of partitions in SQL Server:

Before we begin, there are a few “ground rules” we should understand first:
1 – Partition Functions define the partition ranges
This means that whenever we want to eliminate an old partition range or add a new partition range, the PARTITION FUNCTION is the object that we actually need to modify.

Click through for Eitan’s entire process and a couple of scripts. This is an area that SQL Server could have made a lot easier, especially for periodic processes, by including options like “Daily” or “Monthly” or “Weekly(start on Monday)” for intervals rather than making people specify every partition separately.

Comments closed

Things Not to Include in Data Warehouses

Published 2022-05-04 by Kevin Feasel

Erik Darling compiles a list:

This is a list of things I see in data warehouses that make me physically ill:
– Unique constraints of any kind: Primary Keys, Indexes, etc. Make things unique during your staging process. Don’t make your indexes do that work.

Read on for the full list. I agree with everything except clustered row-store indexes. Those make a lot of sense on dimension tables, tied to the Kimball-style surrogate keys you create in the warehouse itself.

The other part I disagree with is non-clustered columnstore indexes, which I’ve rarely found good use for. Clustrered columnstore indexes are outstanding but the non-clustered variety…meh at best. This answer comes primarily because the pattern I tend to use for warehouse queries is to drive from the fact table, aggregate as much as I can there, and connect to the dimensions for further information at the end. If your warehouse access patterns differ radically from this, you might get more out of non-clustered columnstore indexes. Maybe.

Comments closed

The take Operator in KQL

Published 2022-05-03 by Kevin Feasel

Robert Cain continues a series on KQL:

In this example we took the Perf table, and piped the dataset it generated into the take operator. We indicated we wanted to get 10 rows, which it did as you can see.
It is important to understand that take grabs these rows at random. Further, there is no consistency between each execution of take. You are likely to get a different set of rows with each execution. Let me run the exact same query again, so you can see the different data being returned.

Take if you want a slice, if you want a piece, if it feels alright.

Comments closed

Separating Code from Presentation with Jupyter

Published 2022-05-03 by Kevin Feasel

John Mount disaggregates Jupyter notebook results:

As I switch back and forth between R and Python projects for various clients and partners, I got to thinking: “is there an easy way to separate code from presentations in Jupyter notebooks?”
The answer turns is yes. Jupyter itself exposes a rich application programming interface in Python. So it is very easy to organize Jupyter’s power into tools that give me a great data science and analysis workflow in Python.

Read on to see how.

Comments closed

Alerting on Azure Budget Thresholds

Published 2022-05-03 by Kevin Feasel

Daron Yondem makes a budget:

You can’t imagine how many of us forget to set up the proper alerting mechanisms for our cloud subscription consumption. Here is how to do it in Azure in under 2 minutes.

Read on for the answer. I do like Azure’s budgeting tools except for one big thing: you can’t set a cap. Alerting is great but I want to have a “break glass in case of emergency” capability to stop spend altogether if you hit a certain point. I wouldn’t use it in production but for personal or development accounts, that’s big. And you can do it but only when you have a subscription which uses Azure credits—as soon as dollars are involved, there are no caps.

Comments closed

Using Extended Events with AWS RDS

Published 2022-05-03 by Kevin Feasel

Grant Fritchey tries out extended events in Amazon’s RDS:

AWS has posted the documentation on what you have to do in order to enable the collection of Extended Events within RDS. Normallly, I’d follow along with the documentation. However, I’m going to approach this like I knew that Extended Events support was there, but I wasn’t aware of the docs. So, I’m starting in SSMS and I’m just going to try plugging in the Extended Events GUI to see what happens. Further, I’m going to use the simplest method for launching Extended Events, XEvent Profiler.

Read on for Grant’s findings.

Comments closed

Empty Catch Blocks in T-SQL

Published 2022-05-03 by Kevin Feasel

Erik Darling looks disapprovingly:

So if you execute the above proc, all it returns is an empty result with no error message.
Kinda weird.
Like not having finger or toenails.

I’m dating myself here but I remember a comment in a TheDailyWTF thread in which somebody puts an ASCII art Pikachu in every empty catch block because hey, you gotta catch ’em all.

Comments closed

Securing Azure Storage

Published 2022-05-03 by Kevin Feasel

Craig Porteous continues a series on Azure Data Platform security:

This is the third in a series where I look at all of the resources common to a Data Lakehouse platform architecture and what you need to think about to get it past your security team.
Building upon Azure Databricks, I’ll move from the compute engine to our blob and data lake storage. Things are a little simpler to secure but the plethora of options available can have significant impacts on usability and cost so it’s important to understand the impact before baking them into your design.

Read on for some good advice around securing Azure storage accounts.

Comments closed

Multiple Aggregations with CASE

Published 2022-05-03 by Kevin Feasel

Chad Callihan shows off a good use of aggregate functions and the CASE statement:

Have you have been tasked with pulling multiple counts from the same table? Maybe you need to find how many records have a value for a column and how many are NULL. Or maybe you need to see how many records are true and how many are false.
It’s simple enough to run a query to count one set of criteria, run another query for the second set of criteria, and combine them when sending your results. Did you know you can get multiple counts with one query?
Here’s an example of how using COUNT and CASE can speed up your day.

Click through for an example. I mildly disagree with Chad’s conclusion that this is something you’ll rarely do—the more you work with reporting and analytical queries, the more you’ll appreciate this.

Comments closed

Power Query XML Inconsistencies

Published 2022-05-03 by Kevin Feasel

Chris Webb walks us through some inconsistencies:

A few months ago one of my colleagues at Microsoft, David Browne, showed me an interesting Power Query problem with how the Xml.Tables and Xml.Document M functions handle null or missing values. I’m posting the details here because the problem seems fairly common, it causes a lot of confusion and it’s not easy to deal with.
In XML there are two ways to represent a null or missing value:<a></a> or omitting the element completely. Unfortunately the Xml.Tables and Xml.Document M functions handle these inconsistently: they treat the <a></a> form as a table but the other as a scalar.

Click through for an example and a workaround for the issue.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts