2021-11-22 – Curated SQL

Decision-Making with Bayes’s Theorem

Published 2021-11-22 by Kevin Feasel

Bill Schmarzo lays out a framework to classify decision-making:

In my blog “Making Informed Decisions in Imperfect Situations”, I discussed the importance of properly and objectively framing the decision that we seek to make and how that impacts the data that we gather (and ignore) in an effort to make an informed decision. That is:
Are you trying to gather data to determine the right decisions or are you gathering data to support the decision that you have already made?
In that blog, I introduced two tools that can help us make informed decisions using the best available data, even when that data might be incomplete, conflicting, and/or distorted by others.

Read the whole thing.

Comments closed

Using Enums in Powershell

Published 2021-11-22 by Kevin Feasel

Robert Cain quietly tells us that Powershell is a real programming language, sysadmins who claim to hate programming:

This post begins a series on using Classes in PowerShell. As a first step, we will cover the use of an Enum, as enums are frequently used in combination with classes.
An Enum is a way to provide a set of predetermined values to the end user. This allows the user to pick from a finite list, and assure a value being passed into a function or class will be valid.

Click through to learn more about enums and how they work in Powershell.

Comments closed

Ranking Window Functions

Published 2021-11-22 by Kevin Feasel

I continue a series on window functions in SQL Server:

The whole concept of ranking window functions is to assign some numeric ordering to a dataset. There are four ranking functions in SQL Server. Three of them are very similar to one another: ROW_NUMBER(), RANK(), DENSE_RANK(). The fourth one, NTILE(), is the odd cousin of the family.
Unlike aggregate window functions, all ranking window functions must have at least an ORDER BY clause in the OVER() operator. The reason is that you are attempting to bring order to the chaos of your data by assigning a number based on the order in which you join.

Watch me ramble on about monotonicity and quietly admit that I learned what it was from economics, where the naming feels utterly backward (“strongly monotonic” is the “greater than or equal to” of monotonicity, whereas “weakly monotonic” is the “greater than” of monotonicity). Also, I structured this entire post so that I could get that video from The Prisoner (the good one, not the garbage one) in it.

Comments closed

Data Types Matter, Even in the Serverless SQL Pool

Published 2021-11-22 by Kevin Feasel

Jovan Popovic has a public service announcement for us:

The serverless SQL pool is a distributed computing system that executes concurrent queries on a set of distributed compute nodes. Multiple compute nodes are running the parts of a distributed query plan that read the underlying files, join the data sets, group, and aggregate results. Different queries might try to use the same compute nodes to execute the parts of the queries.
The oversized column types like VARCHAR(MAX) might trick the compute node to allocate more resources than is needed. However, the allocation is based on the estimate, but these over-allocated resources will not be used in actual execution because they are not needed. If a compute node needs 100MB to sort the results it will use these 100MB although the query optimizer allocated 4GB of memory for the task on the compute node.

Read the whole thing.

Comments closed

Azure Synapse Database Templates

Published 2021-11-22 by Kevin Feasel

Aaron Merrill announces database templates for Azure Synapse Analytics:

The Synapse database template for Agriculture is a comprehensive data model that addresses the typical data requirements of organizations engaged in growing crops, raising livestock, and producing dairy products, including field and pasture management and satellite and drone data.
The Synapse database template for Energy & Commodity Trading is a comprehensive data model that addresses the typical data requirements of organizations engaged in trading energy, commodities, and/or carbon credits, whether as a primary trading business or in support of their supply chains, operating businesses, and hedging activities.

You may remember Microsoft buying ADRM Software a while back. This is why.

Comments closed

A Case for EAV?

Published 2021-11-22 by Kevin Feasel

Erik Darling makes the case:

EAV styled tables can be excellent for certain data design patterns, particularly ones with a variable number of entries.
Some examples of when I recommend it are when users are allowed to specify multiple things, like:

I’m not sure I agree on the examples. When there are specific known things with expected shapes, I’d rather have a separate entity to model each. Even if each table is a single string, I’d still like the separation for logical modeling purposes.

That said, there are cases when EAV ends up being the best approach (unfortunately), particularly when you don’t even know the types of things a customer would wish to include. Just try to fight back hard when the inevitable request comes in to pivot all of that data.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Day: November 22, 2021

Decision-Making with Bayes’s Theorem

Using Enums in Powershell

Ranking Window Functions

Data Types Matter, Even in the Serverless SQL Pool

Azure Synapse Database Templates

A Case for EAV?