Kevin Feasel – Page 729

Implementing NORM.INV in Power Query

Published 2021-12-31 by Kevin Feasel

Imke Feldmann has another function to implement:

The Excel NORM.INV function returns the inverse of the normal cumulative distribution for the specified mean and standard deviation. So unlike the NORM.DIST function, that returns the probability of a threshold value to occur under the normal distribution (in CDF mode), this function returns the threshold value that matches a given probability.

Click through for the function definition.

Comments closed

Cleaning SQL Express Databases

Published 2021-12-31 by Kevin Feasel

Kevin Hill knows the pain:

I was contacted by a lawyer that was using a 3rd party application to store emails, keep track of time, etc.
The backend of the application is SQL Server Express edition, which has a hard limit of 10GB for the data file.

One quick note for people with lots of LOB data, remember to reorganize with LOB_COMPACTION = ON as that’s the only way to be sure. Also, depending on how old the version of SQL Server is, there was a bug with LOB compaction which affected SQL Server 2014 and earlier. But, uh, hopefully you’re patched past that point…

Also, getting up to 2016 SP1 means that Express Edition gets data compression. It wouldn’t directly help in this case, but if you have a lot of non-LOB data on Express Edition, it can work wonders, for some definition of “wonders.” After all, if you’re using Express Edition, wonders are by definition pretty small.

Comments closed

Data Exfiltration Protection and Pip

Published 2021-12-30 by Kevin Feasel

I have a post borne from frustration:

I have an Azure Synapse Analytics workspace which uses a managed virtual network and includes data exfiltration protection. I also have a Spark pool. My goal is to import a few packages and use them in a Spark notebook.
Doing so is pretty easy from the Synapse workspace. I navigate to the Manage hub and then choose Apache Spark pools from the Analytics pools menu. Select the ellipsis for my Spark pool and then choose Packages.
From there, because I plan to update Python packages, I can upload a requirements.txt file and have Pip do its job.

But then it doesn’t… Click through to learn why, as well as the workaround for this. It’s stuff like this which makes me say data exfiltration protection is a feature administrators will (mostly) like and developers will hate. Especially because there’s no obvious indicator why this was happening in the error message itself.

Comments closed

Creating Boilerplate Pester Assertions

Published 2021-12-30 by Kevin Feasel

Jeffrey Hicks builds a useful snippet:

During this process, I decided I needed to help myself speed up the test writing phase. I have a standard set of tests that I like to use for functions in my module. But copying and pasting code snippets is tedious. I know I could create a set of VS Code snippets, but that feels limiting and I’d have to make sure the snippets are available on all systems where I might be running VS Code. Instead, I wrote a PowerShell function to accelerate developing Pester 5.x tests.
My function takes a module and extracts all of the public exported functions. For each function, it creates a set of standard Pester assertions. These are the baseline or boilerplate tests that I always want to run for each function. Each function is wrapped in a Describe block. Although, I can opt for a Context block instead. This command will also insert tags. Note that my code for the tag insertion relies on the ternary operator from PowerShell 7.

Click through for the code.

Comments closed

Hierarchical Partition Keys in Cosmos DB

Published 2021-12-30 by Kevin Feasel

Hasan Savran looks at partition keys:

Selecting a partition key for your Cosmos DB is one of the most important choices you need to make for your Cosmos DB project. You really need to take your time and have a plan for your project. Where is this application will be in 1 year? 5 years? How much data are you planning to store? If your application will become popular and you start to have users all over the county or world, do you think your partition key can oversee a growth like this? These are the some of the questions you need to ask yourself. Selecting a partition key is like selecting a life partner for your project. You need a good one that will grow with your project together.
Sometimes, it does not matter how much time you spend to find a good partition key. Your document simply does not have good one. In those cases, usually the best thing you can do is combining multiple properties together and generate a unique custom property called synthetic key.

Read on for a better solution to the problem than a synthetic key.

Comments closed

Row Goal Woes with the EXCEPT Operator

Published 2021-12-30 by Kevin Feasel

Nigel Foulkes-Nock ran into a problem:

In many cases, this works well, but recently I’ve seen examples where it becomes troublesome, specifically when trying to process higher data volumes of data.
The same code can behave perfectly on a small dataset, but then cause issues on a larger database built in exactly the same format. This results in Queries changing from taking a few seconds to struggling to complete.

Read on to see why, as well as one solution that Nigel details.

Comments closed

Automatically Stopping Data Explorer Clusters

Published 2021-12-30 by Kevin Feasel

Gabi Lehner has good news for us:

Azure Data explorer team is constantly focused on reducing COGS and making sure users are paying only for value they are getting.
As part of this initiative, we’re now adding a new automatic capability to stop unused clusters.
In case, you created a cluster and did not ingest any data to it or even if you ingested data and later, you’re not running any queries or ingesting new data for days, we will automatically stop that cluster.

I have two thoughts on this. First, good. Frankly, every cloud service should have automatic pausing unless it makes sense not to—that is, pausing should be the default, not a feature you add later. This is especially true for expensive data processing services.

Second, based on the description, I think I’d like a little more control over this, in terms of how long we go before auto-stop kicks off. It’s ten days, which is a reasonable + large number, but other numbers could make just as much sense for a given user. I like the idea that we see in Databricks and in Azure Synapse Analytics Spark pools: give me a reasonable default, but let me change it in case the reasonable default can’t cut it for some reason.

Comments closed

Changing the Connected Git Repo in ADF

Published 2021-12-30 by Kevin Feasel

Meagan Longoria finds an oddity with Azure Data Factory:

When I arrived at the Git configuration page, I found the Disconnect button to be disabled. This was confusing as I am an Owner and Data Factory Contributor on this resource.

Read on to see how Meagan was able to fix this issue, and also the underlying cause of the problem.

Comments closed

Tower of Hanoi in T-SQL

Published 2021-12-29 by Kevin Feasel

Tomaz Kastrun would like to play a game:

T-SQL Code for the popular game of “Tower of Hanoi”, that can be played in Microsoft SQL Server, Azure Data Studio or any other T-SQL editor with support of query execution.

Given that this is the game you use to teach students recursion, I figured a T-SQL based solution would be interesting. Well, Tomaz has the solution and the workspace to play it yourself.

Comments closed

Quantifier Predicates in SQL

Published 2021-12-29 by Kevin Feasel

Joe Celko takes us through quantifier predicates:

SQL is based on set theory and logic. The only bad news is that many programmers have never had a class on either of those topics. They muddle through using the Boolean operators in their programming language and think that’s all there is to formal logic.
Let’s flashback to the early days of logic and play catch up. We need to start with syllogisms. Syllogisms are logical forms made up of combinations of two statements about classes of things that lead to a conclusion. They were invented by the Greeks and written up by Aristotle in Prior Analytics. You might have run into them, If you had a philosophy class that included lectures on formal fallacies. The three forms of statements allowed are:

Click through to receive a brief primer on formal logic and learn more about how SQL implements these concepts.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Author: Kevin Feasel