Press "Enter" to skip to content

Curated SQL Posts

Replacing Bidirectional Filters with Visual Filters

Alberto Ferrari shows how you can replace bidirectional filters with visual filters in Power BI and improve visual quality:

The noticeable thing about the behavior of the slicer is that the two matrices are showing only the brands and colors purchased by Amanda. Yet, the Color slicer is still showing all the colors, even though we know Amanda only purchased three colors: Grey, Silver and White.

The reason is that the matrices, like most Power BI visuals, hide rows if the measure they are showing produces a blank. Because Amanda did not buy any pink product, the value of Sales Amount for Pink results in a blank, therefore the matrix removes the pink color from its result. Prior to the May 2019 release of Power BI, slicers did not display this behavior because slicers did not have a measure to evaluate – they would only show a list of values from a column; Moreover, visual-level filters were not allowed in slicer visuals whereas they were available in other visuals such as charts, tables, and matrices.

Read on to see how to do this.

Comments closed

Kubernetes + AGs + SQL Server 2019 CTP 3

Allan Hirt has a fix for an annoying bug in the latest CTP of SQL Server 2019:

I haven’t written much about them yet (key emphasis there …) but AGs now being supported for containers in SQL Server 2019 is a big deal. Recently, SQL Server 2019 CTP 3.0 was released, but there’s a slight problem: if you try to deploy an AG with Kubernetes, you may see the following errors when trying to deploy the pods with the YAML that contains their definition. The services (i.e. instances of SQL Server) get created, but the pods do not.

Read on for the root cause and the solution.

Comments closed

Diving Into Columnstore Index Scans

Hugo Kornelis continues a series of posts on index scans:

The Columnstore Index Scan is not really an actual operator. You can encounter it in graphical execution plans in SSMS (and other tools), but if you look at the underlying XML of the execution plan, you will see that it is either an Index Scan or a Clustered Index Scan operator.

SQL Server currently supports three types of index storage: rowstore, columnstore, and memory-optimized. Indexes of each of those types can be the target of an Index Scan or Clustered Index Scan, as indicated by the Storage property. When the Storage property is RowStore or MemoryOptimized, then the normal icon for (clustered) index scan is use, but when Storage is ColumnStore than SSMS (and other tools) choose to show a different icon instead.

Click through for more details.

Comments closed

The Power of Hexagonal Binning

Capri Granville explains hexagonal binning to us and gives a few examples:

The reason for using hexagons is that it is still pretty simple, and when you rotate the chart by 60 degrees (or a multiple of 60 degrees) you still get the same visualization.  For squares, rotations of 60 degrees don’t work, only multiples of 90 degrees work. Is it possible to find a tessellation such that smaller rotations, say 45 or 30 degrees, leave the chart unchanged? The answer is no. Octogonal tessellations don’t really exist, so the hexagon is an optimum. 

Every time I see one of these, I think of old-timey strategy war games.

Comments closed

MLflow 1.0 Released

Clemens Mewald and Matei Zaharia announce the release of MLflow 1.0:

Today we are excited to announce the release of MLflow 1.0. Since its launch one year ago, MLflow has been deployed at thousands of organizations to manage their production machine learning workloads, and has become generally available on services like Managed MLflow on Databricks. The MLflow community has grown to over 100 contributors, and the MLflow PyPI package download rate has reached close to 600K times a month. The 1.0 release not only marks the maturity and stability of the APIs, but also adds a number of frequently requested features and improvements.

The release is publicly available starting today. Install MLflow 1.0 using PyPl, read our documentation to get started, and provide feedback on GitHub. Below we describe just a few of the new features in MLflow 1.0. Please refer to the release notes for a full list.

And it looks like they’re going to keep pushing on it from there.

Comments closed

Kafka In Front of ELK

Daniel Berman sets up a simple Elasticsearch-Logstash-Kibana (ELK) stack and throws Kafka in front of it:

To perform the steps below, I set up a single Ubuntu 16.04 machine on AWS EC2 using local storage. In real-life scenarios you will probably have all these components running on separate machines.

I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Finally, I added a new elastic IP address and associated it with the running instance.

The example logs used for the tutorial are Apache access logs.

This is a great walkthrough on setup and basic configuration. If you don’t have something in place to manage logs, the ELK stack is fine.

Comments closed

Physical Operators: Apply and Nested Loops

Paul Whtie takes us through the Apply operator versus a classic nested loop join operator:

The optimizer’s output may contain both apply and nested loops join physical operations. Both are shown in execution plans as a Nested Loops Join operator, but they have different properties:

Apply
The Nested Loops Join operator has Outer References. These describe parameter values passed from the outer (upper) side of the join to operators on the inner (lower) side of the join. The value of the each parameter may change on each iteration of the loop. The join predicate is evaluated (given the current parameter values) by one or more operators on the inner side of the join. The join predicate is not evaluated at the join itself.

Join
The Nested Loops Join operator has a Predicate (unless it is a cross join). It does not have any Outer References. The join predicate is always evaluated at the join operator.

And to make things tricky, APPLY can generate either of these. Read the whole thing.

Comments closed

Finding Recently Created Objects

Max Vernon has a script to help us find what new objects now exist on your database:

The code below provides a list of all SQL Server objects created in the past “x” number of days. Dynamic T-SQL is used to construct a query for each database, including system databases. Each query provides the schema, name, and date created for each object listed, along with the object type description.

This looks quite useful for auditing. You might want to filter out tempdb on a real system, though.

Comments closed

Dealing with HADR_SYNC_COMMIT Waits

Dmitri Korokevitch walks us through the HADR_SYNC_COMMIT wait type:

The secondary nodes may be configured using asynchronous or synchronous commit. With asynchronous commit, transaction considered to be committed and all locks were released when COMMIT log record is hardened on the primary node. SQL Server sends COMMIT record to secondary node; however, it does not wait for the confirmation that the record had been hardened in the log there.

This behavior changes when you use synchronous commit as shown in Figure 1. In this mode, SQL Server does not consider transaction to be committed until it receives the confirmation that COMMIT log record is hardened in the log on the secondary node. The transaction on primary will remain active with all locks held in place until this confirmation is received. The session on primary is suspended with HADR_SYNC_COMMIT wait type.

Click through for the full story.

Comments closed

Using Power Query to Pivot Text

Matt Allington shows how you can pivot text data from an Excel spreadsheet using Power Query:

It is very common to need to transform data from one “shape” to another “shape” before it can be used inside Power BI for analysis (although many beginners don’t realise this).   One such example is shown below, where the data in the table on the left hand side needs to be transformed into the table on the right hand side.  As you can see on the left, column A contains the attribute and column B contains the value of the attribute.  Every 4 lines of data is 1 record.  This specific problem is very common problem when your only source of data is from an extract (eg csv) from some other system, particularly older systems where you can’t change the format of the data extract.

This is a clever solution.

Comments closed