Month: April 2021

More About Default Parameter Values in Powershell

Published 2021-04-15 by Kevin Feasel

Jeffrey Hicks follows up on a prior post:

Last week I shared a little nugget about making PowerShell life easier by using the built-in variable, $PSDefaultParameterValues. This is a special hashtable where you can define default parameter values for any PowerShell command. This means any PowerShell script or function that has defined parameters AND uses [cmdletbinding()]. If you have a simple function that only uses a Param() block, $PSDefaultParameterValues won’t work for that function. I recommend that all PowerShell functions use [cmdletbinding()] so hopefully, this won’t be an issue. Since I received some positive feedback and a few questions on the previous post, I thought a quick follow-up might be in order.

Read on for additional useful information regarding $PSDefaultParameterValues.

Comments closed

Fun with Multiple Indexes

Published 2021-04-15 by Kevin Feasel

Erik Darling makes a fairly rare multi-index sighting:

Notice! Both of our nonclustered indexes get used, and without a lookup are joined together.
Because the predicates are of the inequality variety, the join columns are not ordered after each seek. That makes a hash join the most likely join type.

I’ve always had this belief that there are probably more cases in which multi-index solutions are useful than the SQL Server optimizer gives us. This belief may be irrational.

Comments closed

Superkeys

Published 2021-04-14 by Kevin Feasel

Kevin Wilkie knows that not all keys wear capes:

Well, that’s because we sometimes need different ways to describe what we’ve got going on. Of the four different types of keys we’ve discussed so far, they are all different enough that we need to differentiate them and be able to explain what the differences are. For the rest of the keys that we’ll go through today, the same idea exists. They are close to the others but different enough that there is a need for another name for that type of key.
There is a super key.

Read on to learn what a superkey is. That will put you one quarter of the way to understanding Boyce-Codd Normal Form: a relational variable is in Boyce-Codd Normal Form if and only if all functional dependencies have superkey determinants.

Comments closed

Common Query Plan Patterns for Joins

Published 2021-04-14 by Kevin Feasel

Erik Darling has a new mini-series. First up is degrees of parallelism and bitmaps:

But wait! This query runs at DOP 4. You can tell by looking at the number of executions.
Who runs queries at DOP 4?
Fools.

This DOP goes up to 11.

Next, Erik versus equality predicates:

But not every query has an available equality join predicate. For various reasons, like poor normalization choices, or just the requirements of the query, you may run into things like I’m about to show you.
If there’s a headline point to this post, it’s that joins that don’t have an equality predicate in them only have one join choice: Nested Loops. You cannot have a Hash or Merge join with this type of query. Even with the best possible indexes, some of these query patterns just never seem to pan out performance-wise.

Read on for the implications of this, as well as techniques to improve performance.

1 Comment

Importing Data from ADLS Gen2 into Power BI

Published 2021-04-14 by Kevin Feasel

Chris Webb summarizes a significant amount of work:

Over the last few months I’ve written a series of posts looking at different aspects of one question: what is the best way to import data from ADLSgen2 storage into a Power BI dataset? For example, is Parquet really better than CSV? Should you use Azure Synapse Serverless? In this post I’m going to summarise my findings and offer some recommendations – although, as always, I need to stress that these are the conclusions I can draw from my test results and not the absolute, incontrovertible “Microsoft-says” truth so please do your own testing too.

Read on and check it out for yourself.

Comments closed

Using Notebooks in Azure Machine Learning Studio

Published 2021-04-14 by Kevin Feasel

Lina Kovacheva takes us through the process of working with notebooks in Azure Machine Learning Studio:

I discovered Jupiter notebooks not that long ago, but the more I use them the more I see how powerful they could be. For those of you who are not familiar whit Jupiter Notebook: It is an open-source web application where you can combine code, output, visualizations and explanatory text all in one document allowing you to write a code that tells a story. Now that you have an idea of what Jupiter notebook is I will walk you through how you can use it in Azure Machine Learning Studio.

Click through for the process. One advantage to notebooks in an environment like Azure ML over Azure Data Studio is that you have a much wider variety of languages, although Azure Data Studio has a SQL Server kernel, which other platforms currently do not have.

Comments closed

Lessons from using Notebooks

Published 2021-04-14 by Kevin Feasel

Glenn Berry takes us through some of the past (and sometimes present) challenges of running notebooks in Azure Data Studio:

I have to admit that I do not use Jupyter notebooks or Azure Data Studio (ADS) everyday. Last August, I made separate Jupyter notebook versions of my SQL Server Diagnostic Information Queries. There was a separate version for SQL Server 2012 through SQL Server 2019, along with one for Azure SQL Database. This was after a number of requests from people in the community.
Creating these notebooks was a pretty decent amount of work. Luckily, this was right around the time that Azure Data Studio was making it much easier to edit and format markdown for the text blocks. Since then, Azure Data Studio is even easier to use for editing and formatting. Even more fortuitous was the fact that Julie Koesmarno (@MsSQLGirl) volunteered to greatly improve my formatting!
Unfortunately, there has not been as much interest in my Jupyter notebooks as I hoped for. There are probably a number of reasons for this.

Read on for Glenn’s notes.

Comments closed

Power BI Cross-Report Drill-Through

Published 2021-04-14 by Kevin Feasel

Marc Lelijveld takes us through the benefits and challenges of drilling through to a different report in Power BI:

It is very common to have multiple reports for different audiences, while there is also one (group of) user(s) that requires to have an overview over all these different insights. The main challenge you will face, is having cross-report interactivity and find related insights.
Let’s take an example of three different roles, where we have a customer account manager, reseller manager and a regional manager. Of course, they should have the same single source of truth, but there is one thing you want to avoid as report creator! You do not want to create three different report for the three mentioned audiences. But as they have different roles and responsibilities, you do not want them to see each other’s data and keep it clean and simple! In this blog I will describe how you can setup cross-report drill through to jump from one report to another, while respecting applied filters and avoiding building three separate reports!

Click through for the process, as well as potential issues you may hit along the way.

Comments closed

Running a Docker Container as a WSL2 Distribution

Published 2021-04-14 by Kevin Feasel

Andrew Pruski has a wacky idea that just might work:

I’ve been playing around a bit with WSL2 and noticed that you can import TAR files into it to create your own custom distributions.
This means that we can export docker containers and run them as WSL distros!
So, let’s build a custom SQL Server 2019 docker image, run a container, and then import that container into WSL2…so that we have a custom distro running SQL Server 2019.

Read on to see how.

Comments closed

SCD2 Dimensions on Spark with Apache Hudi

Published 2021-04-13 by Kevin Feasel

David Greenshtein shows how we can build type-2 slowly changing dimensions using Apache Hudi:

Implementing SCD2 in a data lake without using an additional framework like Apache Hudi introduces the challenge of updating data stored on immutable Amazon S3 storage, and as a result requires the implementor to create multiple copies of intermediate results. This situation may lead to a significant maintenance effort and potential data loss or data inconsistency.
Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. Hudi enables you to manage data at the record level in Amazon S3 and helps to handle data privacy use cases requiring record-level updates and deletes. Hudi is supported by Amazon EMR starting from version 5.28 and is automatically installed when you choose Spark, Hive, or Presto when deploying your EMR cluster.

Click through for the process.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30