Press "Enter" to skip to content

Curated SQL Posts

Histograms versus Bar Charts

Alex Velez explains the difference between a histogram and a bar (or column) chart:

Consider the above illustration of two data visualizations. 

A histogram is on the left, and to the right is a bar chart (also known as a bar graph). Histograms and bar charts look almost identical, yet they are dramatically different. Understanding their differences is important, so you know when to use each one and accurately convey—or consume—the insights they contain. 

Let’s take a closer look. 

Click through for that closer look.

Comments closed

DEFINE TABLE in DAX Queries

Marco Russo and Alberto Ferrari takes us through the DEFINE TABLE statement in DAX:

Introduced in December 2020, the DEFINE TABLE statement lets you define a calculated table local to a query. The table is not persisted in the model, it exists only for the lifetime of the query. Apart from that, it is a calculated table in every sense of the term albeit with some limitations.

The extension of DAX with the capability to define calculated tables local to a query is needed in order to support composite models (DirectQuery for Power BI datasets and Azure Analysis Services). There are no limitations on the use of the feature, so you can take advantage of local tables in any DAX query. We refer to calculated tables defined in a query as query calculated tables, or query tables for short.

Click through for an example of how it works.

Comments closed

Powershell Editors and Environments

Greg Moore gives us an overview of the Powershell IDE landscape:

Before I go too deep into this article, I want to distinguish between editing a file and running it. I’m going to focus on editors here, but most development environments include a way to execute a PowerShell script or PowerShell commands. However, do not confuse the editor with the execution environment.

I used the Powershell ISE for a long while, but eventually stopped because its settings were just different enough from the shell’s settings that things which would work just fine in the ISE would fail when I set them up as automated tasks. I don’t remember what those things were, though, so further research may be required. Nowadays, I’ll use VS Code when I need a proper editor and just wing it on the shell for one-off stuff.

Comments closed

Searching through Powershell History

Jess Pomfret describes a helpful module:

I was listening to a podcast last week about PowerShell, when one of the hosts mentioned having to ‘up arrow’ back through your history to find a command you wanted to rerun.  This made me realise that I should write this quick post on using PSReadLine’s interactive search function.  This tip is a serious time saver and I rely on it heavily.

The great news is that if you are using Windows PowerShell on Windows 10 or if you’re using PowerShell 6+, PSReadLine is already installed and you can immediately start using this tip.  If you don’t have the module though, it’s easy enough to install from the PowerShell Gallery:   

Read on to learn how to use it.

Comments closed

The Limits of Filtered Indexes

Erik Darling lays out the pros and cons of filtered indexes:

Filtered indexes are really interesting things. Just slap a where clause on your index definition, and you can do all sorts of helpful stuff:

– Isolate hot data
– Make soft delete queries faster
– Get a histogram specific to the span of data you care about

Among other things, of course. There are some annoying things about them though.

– They only work with specific ANSI options
– If you don’t include the filter definition columns in the index, it might not get used
– They only work when queries use literals, not parameters or variables

Click through for examples of them in action. I would definitely like to see improvements to filtered indexes along the lines that Erik mentions. They have so much potential, but are really held back by those limitations.

Comments closed

Reporting on Correlation Analysis in R

Petr Baranovskiy continues a series on correlation analysis using R:

This is the second part of the Correlation Analysis in R series. In this post, I will provide an overview of some of the packages and functions used to perform correlation analysis in R, and will then address reporting and visualizing correlations as text, tables, and correlation matrices in online and print publications.

Read the whole thing.

Comments closed

K-Means and K-Medoids Clustering

Niti Sharma explains two clustering algorithms:

K-means and k-medoids are methods used in partitional clustering algorithms whose functionality works based on specifying an initial number of groups or, more precisely, iteratively by reallocation of objects among groups.

The algorithm works by first segregating all the points into an already selected number of clusters. The process is carried out by measuring the distance between the point and the center of each cluster. And because k-means can function only in the Euclidean space, the functionality of the algorithm is limited. Despite the drawbacks or shortcomings of algorithm possesses, k-means is still one of the most powerful tools used in clustering. The applications can be seen widely used in multiple fields – physical sciences, natural language processing (NLP), and healthcare.

k-means is a fairly common algorithm, but you hear less about k-medoids—it’s the more robust alternative to k-means.

Comments closed

The Production-Readiness of Azure Synapse Analytics

Paul Andrew casts some harsh light:

While I completely share and actually like Microsoft’s vision of an analytics resource…

“that brings together data integration, enterprise data warehousing and big data analytics”

https://azure.microsoft.com/en-gb/services/synapse-analytics/

… the marketing, hype and technical implementation have resulted in a lot of confusion and disappointment.

So, to answer the title of this blog post directly. My opinion, as I write on 29th January 2021, is: NoAzure Synapse Analytics is not ready. Sorry Microsoft, but you’ve had long enough. I can’t hold back the questions and demands from customers anymore on why Synapse still isn’t included in my architecture diagrams.

Paul raises many good points, and the positive takeaway is that these are fixable issues. But as of today, they are definitely things you want to consider before jumping in.

Comments closed