Curated SQL – Page 603 – A Fine Slice Of SQL Server

What’s New in SynapseML

Published 2022-08-11 by Kevin Feasel

Nellie Gustafsson and Mark Hamilton share an update:

SynapseML is a massively scalable (feel free to spin up hundreds of machines!) machine learning library built on Apache Spark. SynapseML makes it easy to train production-ready models to solve problems from simple classification and regression to anomaly detection, translation, image analysis, speech to text, and just about any ML challenge you are facing. Under the hood, SynapseML integrates a wide array of ML technologies such as LightGBM, Vowpal Wabbit, ONNX, and the Cognitive Services into a single easy to use API compatible with MLFlow. We know, we know, everyone hates when developers invent new APIs, but you can rest easy because SynapseML integrates cleanly into existing Spark ML APIs so you can embed models directly into existing pipelines. We strive to make SynapseML available to developers wherever they work, and the library is available in a variety of languages like Python, Scala, Java, R. As of this release SynapseML is also usable from .NET, C#, F#.

Saving the best language for last, I see. Click through for the list of updates.

Comments closed

What’s in a Name?

Published 2022-08-11 by Kevin Feasel

Benjamin Smith analyzes a name change:

Recently, RStudio announced its name change to Posit. For many this name change was accepted with open arms, but for some-not so. Being the statistician that I am I decided to post a poll on LinkedIn to see the sentiment of my network. After running the poll for a week the results were in:

Read on for the responses as well as an analysis using RSTAN.

Comments closed

Data Retention: Definition and Policy

Published 2022-08-11 by Kevin Feasel

Joey Jablonski thinks about data retention:

Data retention policies should be defined in a way that they are easy to understand, easy to be implemented programmatically, and should enable engineering teams to operate independently most of the time when working with datasets that are known and already leveraged by the organization. In addition to policy definitions, data governance leaders should ensure changes are part of data literacy plans for training and rollout to ensure awareness across the organization.

This is something that most DBAs provide input into but don’t directly control. Still, it’s good to know some of the challenges around data retention and figure out how to apply it to your organization.

Comments closed

Cross-Platform SQL Server Availability Groups

Published 2022-08-11 by Kevin Feasel

Rajendra Gupta shows how to set up an Availability Group in SQL Server which runs on both Windows and Linux:

Microsoft supports SQL Server on Linux, and it has many of the same features as the Windows version. You can restore databases from Windows to Linux SQL or vice versa. The Linux SQL works with Red Hat, Ubuntu, SUSE enterprise, Kubernetes containers, and Docker.
Windows-based SQL instance supports SQL Server Always On Availability Groups for high availability and disaster recovery. If you are not familiar with Windows AG configuration, refer to the extensive series on Always on Availability Group (Toc at the bottom).
If you have both Windows and Linux SQL Server, is it possible to configure an availability group between them? Let’s explore this in this article.

This example uses async mode, which is the easier one to set up. With synchronous, you’re probably looking at using Pacemaker to sort out AG status.

2 Comments

Database-Driven Parameterization for Synapse Pipelines

Published 2022-08-11 by Kevin Feasel

Paul Hernandez does some configuring:

Particularly in Synapse, there are even no global parameters like in Azure Data Factory.
When you want to move your development to another environment, typically CI/CDs pipelines are used. These pipelines consume an ARM template together with its parameter file to create a workspace in a target environment. The parameters can be overriding in the CD pipeline as explain here: https://techcommunity.microsoft.com/t5/data-architecture-blog/ci-cd-in-azure-synapse-analytics-part-4-the-release-pipeline/ba-p/2034434
Even so, I have not found a proper way to change the values of a pipeline parameter (the same for data flows and datasets parameters). I saw some custom parameters manipulation to set the default value of a parameter and then deploy it without any value, or even JSON manipulation with PowerShell (the dark side for me).

Read on for an alternative solution which does the job well.

Comments closed

Sharing Power BI Content outside the Organization

Published 2022-08-11 by Kevin Feasel

Mara Pereira wants to share some data:

I am seeing more and more customers trying to use Premium capabilities to create data products that they can incorporate as part of their main product offering. This kind of reporting as a product solution will add a lot more value to their main product, so I can see why this is becoming quite trendy.
However, it became obvious that the current documentation can be a bit overwhelming and confusing at first.
So I decided to compile the process of sharing content outside of your organisation in a blog post. Happy days!

Click through to see how to share within the Power BI Service.

Comments closed

Anomaly Detection over Delta Live Tables

Published 2022-08-10 by Kevin Feasel

Avinash Sooriyarachchi and Sathish Gangichetty show off an interesting scenario:

Anomaly detection poses several challenges. The first is the data science question of what an ‘anomaly’ looks like. Fortunately, machine learning has powerful tools to learn how to distinguish usual from anomalous patterns from data. In the case of anomaly detection, it is impossible to know what all anomalies look like, so it’s impossible to label a data set for training a machine learning model, even if resources for doing so are available. Thus, unsupervised learning has to be used to detect anomalies, where patterns are learned from unlabelled data.
Even with the perfect unsupervised machine learning model for anomaly detection figured out, in many ways, the real problems have only begun. What is the best way to put this model into production such that each observation is ingested, transformed and finally scored with the model, as soon as the data arrives from the source system? That too, in a near real-time manner or at short intervals, e.g. every 5-10 minutes? This involves building a sophisticated extract, load, and transform (ELT) pipeline and integrating it with an unsupervised machine learning model that can correctly identify anomalous records. Also, this end-to-end pipeline has to be production-grade, always running while ensuring data quality from ingestion to model inference, and the underlying infrastructure has to be maintained.

Click through to see their solution using Databricks and delta lake.

Comments closed

Forced Plans Lacking Force

Published 2022-08-10 by Kevin Feasel

Chad Callihan does not fall astray of Betteridge’s Law of Headlines:

Query Store is useful for forcing a beneficial plan. What if something changes that makes the forced plan impossible to use? Let’s look at an example of that today.

Read on for an example. The answer is, I believe, the best possible outcome given the circumstances—certainly better than a hard failure.

Comments closed

Power BI Desktop August 2022 Updates

Published 2022-08-10 by Kevin Feasel

Matt Allington looks at some recent updates to Power BI:

I’ve been pretty busy over the last few months. The demand for Power BI skills has never been stronger, and my company is super busy. I haven’t written a blog article for a while, but I wanted to take a bit of time out this morning to talk about the August 2022 update to Power BI Desktop. As Power BI matures, there is less and less to get excited about with a new release of Desktop, but there were a couple of things that caught my eye in this release, worthy of calling out.

Read on for a couple of quality of life improvements.

Comments closed

Power BI Culture Name errors

Published 2022-08-10 by Kevin Feasel

Gilbert Quevauvilliers is a cultured individual:

I was recently working with an overseas customer and when I tried to open the PBIX file I got the following error
“Culture Name ‘en-IL” is not valid or is not supported”

Read on to understand what the problem was as well as how Gilbert was able to fix it.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts