Press "Enter" to skip to content

Curated SQL Posts

SSMS Query Plans and Arrow Sizes

Brent Ozar clarifies what arrow sizes actually mean in execution plans:

That means the entire concept of the arrow is made up by the rendering application – like SQL Server Management Studio, Azure Data Studio, SentryOne Plan Explorer, and all the third party plan-rendering tools. They get to decide arrow sizes – there’s no standard.

SSMS’s arrow size algorithm changed back in SQL Server Management Studio 17, but most folks never took notice. These days, it’s not based on rows read, columns read, total data size, or anything else about the data moving from one operator to the next.

There’s an answer, but it’s not particularly intuitive. I think SentryOne Plan Explorer has the upper hand on this one.

Comments closed

Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4:

We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:
– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.
– Simplifies integrated security by using IAM Role Passthrough for metadata in Glue.
– Provides easier access to metadata across the Amazon stack and access to data catalogued in Glue.

There are some interesting changes in here.

Comments closed

Feeding Kubernetes Log Data to Logstash and Kibana

Aayushi Johari shows how you can stand up a Kubernetes cluster and review log data using Logstash and Kibana:

In this article, you will learn how to publish Kubernetes cluster events data to Amazon Elastic Search using Fluentd logging agent. The data will then be viewed using Kibana, an open-source visualization tool for Elasticsearch. Amazon ES consists of integrated Kibana integration.

We will walk you through with the following process:
Creating a Kubernetes Cluster
Creating an Amazon ES cluster
Deploy Fluentd logging agent on Kubernetes cluster
Visualize kubernetes date in Kibana

Click through for the full article.

Comments closed

Case-Insensitive Searches in Snowflake

Koen Verbeeck shows how you can perform case-insensitive searches in Snowflake DB:

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.

Today I have a small blog post about a neat little function I discovered last week – with thanks to my German colleague, who wants to remain anonymous. The function is called ILIKE and it is syntactic sugar for the combination of UPPER and LIKE.

I’m personally not a fan of case-sensitive collations for data; it’s hard for me to understand the meaningful differences between “dog,” “Dog,” and “DOG.”

Comments closed

Multi-Level Unpivoting with Power Query

Teo Lachev shows us how you can unpivot multiple columns in Excel using Power Query:

The user wants to unpivot the data by rotating the three header rows (Scenario Type, Month, and Year) from columns to rows. The issue is that the headers span three rows. If you just select these columns and unpivot, you’ll end up with a mess. And Power Query operates on row at the time so you can’t reference previous rows, such as to concatenate Scenario, Month, and Year. We can do the concatenation in Excel so we have one row with column headers, such as Actuals-Jan-2018, Actuals-Feb-2018, and so on, which we can easily unpivot in Power Query. But if we can’t or don’t want to modify the Excel file, such as to avoid the same steps every time a new file comes in?

Click through for a sample file which shows how you can do this.

Comments closed

Building an AKS Cluster

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service:

Next, we will create a resource group by executing the following command:
az group create –name nameOfMyresourceGroup –location eastus2

Once you execute the above command, you can go into the Azure portal and refresh your resource group pane and see the newly created resource group.

Once that is setup, it’s time to create the actual Kubernetes cluster.

Click through for the full set of instructions.

Comments closed

Building a Power BI Accordion Filter

David Eldersveld builds out a Power BI accordion filter:

The Power BI custom accordion relies on Bookmarks and Buttons as key elements. I’ve only created two categories in my accordion. I’ll be honest–it’s probably more work than it’s worth to keep track of different buttons due to positions as well as what’s visible or hidden for each bookmark. The thought of expanding to three categories is a bit daunting. Why is that?

Read on to see why (hint: combinatorial explosion).

Comments closed

Importing Biml Metadata from Excel

David Stein shows how you can take table and column data from Excel and use it to populate Biml flows:

Excel Spreadsheets as a metadata source have a lot going for them.
– Everyone uses Excel and is comfortable with it.
– Excel is incredibly customizable and versatile.
– Excel offers data validation and filtering.

For these reasons, I create customized Excel spreadsheet that function as a lite Graphic User Interface (GUI) for metadata. Of course, Excel isn’t a perfect metadata source. For one thing, you have to own a licensed copy of Excel. Second, because spreadsheets are so easy to customize, users sometimes “improve” them further and break your code.

Read on for an example.

Comments closed

Comparing Iterator Performance in R

Ulrik Stervbo has a performance comparison for for, apply, and map functions in R:

It is usually said, that for– and while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed.

The first loop is perhaps the worst I can think of – the return vector is initialized without type and length so that the memory is constantly being allocated.

The performance of map isn’t great, though the benefits to me are less about performance and more about readability. H/T R-bloggers

Comments closed