Press "Enter" to skip to content

Curated SQL Posts

Filter on Aggregate Columns in Spark

Landon Robinson shows off the HAVING clause:

Having is similar to filtering (filter()where() or where, in a SQL clause), but the use cases differ slightly. While filtering allows you to apply conditions on your data to limit the result set, Having allows you to apply conditions on aggregate functions on your data to limit your result set.

Both limit your result set – but the difference in how they are applied is the key. In short: where filters are for row-level filteringHaving filters are for aggregate-level filtering. As a result, using a Having statement can also simplify (or outright negate) the need to use some sub-queries.

Click through for examples of HAVING in use.

Comments closed

Kibana Dashboards on Azure Data Explorer

Guy Reginiano has an announcement for us:

Elasticsearch and Kibana users can now easily migrate to Azure Data Explorer (ADX) while keeping Kibana as their visualization tool, alongside the other Azure Data Explorer experiences and the powerful KQL language.
A new version of K2Bridge (Kibana-Kusto free and open connector) now supports dashboards and visualizations, in addition to the Discover tab which was previously supported.

Click through to see how it works. I’m not the world’s biggest fan of Kibana by any stretch of the imagination but it’s nice to have this ability.

Comments closed

Working with Notebooks in Azure ML

I have started a new series:

In the prior series, Low-Code Machine Learning with Azure ML, we saw how to get started with Azure Machine Learning in a fairly pain-free way, especially for developers getting started with machine learning. In this series, I will assume that you already know all of those details and instead, we’re going to go full-code.

There are a few different ways in which we can go full-code with Azure ML. Today, we’re going to look at the easiest of those methods: using Jupyter notebooks within Azure ML Studio.

Read on for the first post in the series.

Comments closed

Data Masking with Powershell

Jeff Hicks wants to stop a shoulder-surfing attack:

If I print this or send it to a file, I might not want the computername to be shown. Or maybe even my name. I want PowerShell to handle this for me. In short, I need to replace strings like ‘Prospero’ and ‘Jeff Hicks’ with some like XXX and ‘Joe Doe’.

PowerShell has a Replace operator. Or I can use regular expressions. Naturally, I wenbt with the latter.

Read on to see what you’d need to do.

Comments closed

Modifying Existing Objects with Powershell

Robert Cain dives a bit further into objects in Powershell:

So far we’ve covered a lot in this series on PowerShell Objects. We began by creating a class using the class keyword introduced in PowerShell 5. We then looked at using PSCustomObject to create new objects, then add methods to those objects. In the previous post, we used C# code to create our PowerShell objects.

In this installment, we’ll see how to add properties and methods to existing objects created by someone else. For this example we’ll use objects returned by the Get-ChildItem cmdlet, but we could use any objects.

Click through to see what you can do.

Comments closed

Making an R Box Plot from a Picture

Tomaz Kastrun builds a plot:

We create a raster image from a picture and calculating the ratio of the pixels on the scale of grayscale. The more the darker colour is represented in the pixels, the bigger the value. And this value is converted into the vector of values. And each vector is represneted as a violin boxplot.

Click through for an example.

Comments closed

Choosing between Duplicate Indexes

David Alcock wants to know what choice you make when all choices lead to the same conclusion:

As there is an index that has the LastName and FirstName columns the optimiser has opted for an index seek operator using the IX_Person_LastName_FirstName_MiddleName index, and if I look into the Plan XML I can see that it’s using a trivial plan: StatementOptmLevel=”TRIVIAL”.
This basically means there’s one obvious way to return the query results so the optimiser has avoided the cost of going through full optimisation and has elected to use this plan straightaway.
So what happens if I create an identical copy of that particular index, in fact let’s create five indexes that are exactly the same:

There’s a Mass Effect 3 joke in my intro line. But read on for the answer and also check out Barney Lawrence’s comment for a bit more elucidation.

Comments closed

get_json_object and json_tuple in Hive

The Hadoop in Real World team looks at a pair of Hive functions:

Both get_json_object and json_tuple functions in Hive are meant to work with JSON data in Hive. 

Let’s create a table with some JSON data to work with. We have created a table named hirw_courses and loaded the below JSON text into the table.

Click through for that script as well as seeing how you use each of get_json_object() and json_tuple() and which might be better-suited for your purposes.

Comments closed