Press "Enter" to skip to content

Author: Kevin Feasel

TF-IDF in .NET for Spark, Updated

Ed Elliott has been busy:

Apache Spark has had a machine learning API for quite some time and this has been partially implemented in .NET for Apache Spark.

In this post we will look at how we can use the Apache Spark ML API from .NET. This is the second version of this post, the first version was written before version 1 of .NET for Apache Spark and there was a vital piece of the implementation missing which meant although we could build the model in .NET, we couldn’t actually use it. The necessary functionality is now available and so I am updating the post. To see the previous version go to: https://the.agilesql.club/2020/07/tf-idf-in-.net-for-apache-spark-using-spark-ml/

Read on for more information, as well as a call to action.

Comments closed

Small Multiples in Power BI

Chris Webb takes us through a new feature in Power BI:

While the long-awaited small multiples feature that previewed in the December 2020 release is an obvious boost to Power BI’s data visualisation capabilities, did you know that you can use it to improve report performance too?

Earlier this year I wrote blog posts showing how you can improve report performance by showing the same amount of data in fewer visuals (for example by replacing several cards with a single table) and how the number of visuals on a page affects report performance even if they aren’t displaying any data; several other people have written similar posts too. Small multiples are just another way you can replace several visuals with a single visual that displays the same data.

I liked this feature for the visualization improvements, but if you can throw in performance improvements as well, I’m sold.

Comments closed

The Importance of Composite Models

Paul Turley lays out the significance of composite models in Power BI:

There are been many attempts by Microsoft and other vendors to create a data modelling architecture that provides for fast access to cached data, direct access to live data and scaled-out connections to established data models. Remember ROLAP and HOLAP storage in multidimensional cubes? These were great concepts with significant trade-off limitations. No other vendor has anything like this. Way back in the day, Microsoft jumped on the Ralph Kimball bandwagon to promote the idea that a company should have a “one version of the truth” exposed through their data warehouse and cubes or semantic data models. They met customer demand and gave us a BI tool that, in order to bring data together from multiple sources, makes it easy to create a lot of data silos. Arguably, there are design patterns to minimize data duplication but to use governed datasets, self-service report designers are limited to connecting to large, central models that might only be authored and managed by IT. This new feature can restore balance to the force and bring us back to “one version of the truth” again.

Read on for Paul’s early thoughts on the feature.

Comments closed

Sync Logins between Availability Group Replicas

Taryn Pratt has a process:

Always On Availability Groups can support up to nine availability replicas, and while we don’t use anywhere near that many replicas in each of our clusters, we do have 2 replicas per cluster (3 servers total), with the replicas being used as a readable secondary.

Since we use readable secondaries in our environments, the application needs to connect to both the primary and the secondary servers with the same login. The catch is, logins don’t automatically sync across replicas. If the logins don’t sync, the application won’t connect to a secondary, which results in login failures.

Read on for one way to solve the problem.

Comments closed

Correlated Subqueries which Don’t

Daniel Hutmacher gives us an eye test:

The developer wrote this pretty little query to show us which accounts are up for review (which in our case means they have a “30” flag).

SELECT account, balance, 'For review' AS [status]
FROM #accounts WHERE account IN (SELECT account FROM #accountFlags WHERE flag=30) ORDER BY account;

Did you spot it?

I did, but in fairness, I’ve been burned enough times by this that I check for it.

Comments closed

T-SQL Additions to Serverless SQL Pools

Jovan Popvic lays out some of the T-SQL syntax added to serverless SQL pools in Azure Synapse Analytics:

Serverless Synapse SQL pools in Azure Synapse Analytics have a new set of features that will enable you to analyze your Azure data more efficiently. The new Transact-SQL (T-SQL) language features that you can use in serverless SQL pools are STRING_AGGOFFSET/FETCHPIVOT/UNPIVOTSESSION_CONTEXT, and CONTEXT_INFO.

Old T-SQL hands will likely know what all of this does, but click through if something looks unfamiliar. All of this is available in SQL Server 2017 and later (and everything but STRING_AGG() is available going back to 2008).

Comments closed

Foreign Keys and Updating the Parent

Hugo Kornelis conclues a mini-series on foreign key constraints:

Welcome to part fourteen of the plansplaining series, where I wrap up the mini-series on how simple foreign keys have huge effects on execution plans for data modifications.

We already looked at inserting data in the referencing (child) table, and at deleting data from the referenced (parent) table as well as updates in the child table. We did not and will not look at deleting from the child table or inserting in the parent table: those operations can by default never violate the foreign key constraint, so no additional logic is needed.

So that means there is only one thing left to explore: updating the parent. Perhaps surprisingly, this is actually quite complex, so it warrants an entire post of its own.

Read on to see why.

Comments closed

Building an Azure Function in R

David Smith has a demo for us:

It’s important to note that the model prediction is not being generated by the Shiny app: rather, it’s being generated by an Azure Function running R in the cloud. That means you could integrate the model estimate into any application written in any language: a mobile app, or an IoT service, or anything that can call an HTTP endpoint. Furthermore, you don’t need to worry how many apps are running or how often estimates will be requested by the app: Azure Functions will automatically scale to meet the demand as needed.

Read the whole thing. Given that R isn’t naturally supported by Azure Functions, I think this is quite interesting.

Comments closed

December 2020 SQL Tools Releases

Drew Skwiers-Koballa gives us an update on where SQL Server tooling is at:

The December releases of Azure Data Studio 1.25 and SQL Server Management Studio (SSMS) 18.8 are now generally available.  Additionally, the mssql extension for Visual Studio Code has recently been updated to version 1.10.0. Read on to learn more about each of these updates and grab the latest versions of SSMS, Azure Data Studio, or the mssql extension for VS Code.

Read on to learn more.

Comments closed

ADF Switch Activities

Nick Edwards shows off the Switch activity in Azure Data Factory:

Prior to the switch statement I could achieve this using 4 ‘IF’ activities connected to a lookup activity as shown in the snip below using my ‘Wait’ example pipeline.

However a neater solution is to use the ‘Switch’ activity to do this work instead. I’ll now jump straight into a worked example to show you how I achieved this.

Click through for the demo.

Comments closed