Press "Enter" to skip to content

Day: August 11, 2022

What’s New in SynapseML

Nellie Gustafsson and Mark Hamilton share an update:

SynapseML is a massively scalable (feel free to spin up hundreds of machines!) machine learning library built on Apache Spark. SynapseML makes it easy to train production-ready models to solve problems from simple classification and regression to anomaly detection, translation, image analysis, speech to text, and just about any ML challenge you are facing.  Under the hood, SynapseML integrates a wide array of ML technologies such as LightGBM, Vowpal Wabbit, ONNX, and the Cognitive Services into a single easy to use API compatible with MLFlow. We know, we know, everyone hates when developers invent new APIs, but you can rest easy because SynapseML integrates cleanly into existing Spark ML APIs so you can embed models directly into existing pipelines. We strive to make SynapseML available to developers wherever they work, and the library is available in a variety of languages like Python, Scala, Java, R. As of this release SynapseML is also usable from .NET, C#, F#.

Saving the best language for last, I see. Click through for the list of updates.

Comments closed

What’s in a Name?

Benjamin Smith analyzes a name change:

Recently, RStudio announced its name change to Posit. For many this name change was accepted with open arms, but for some-not so. Being the statistician that I am I decided to post a poll on LinkedIn to see the sentiment of my network. After running the poll for a week the results were in:

Read on for the responses as well as an analysis using RSTAN.

Comments closed

Data Retention: Definition and Policy

Joey Jablonski thinks about data retention:

Data retention policies should be defined in a way that they are easy to understand, easy to be implemented programmatically, and should enable engineering teams to operate independently most of the time when working with datasets that are known and already leveraged by the organization. In addition to policy definitions, data governance leaders should ensure changes are part of data literacy plans for training and rollout to ensure awareness across the organization.

This is something that most DBAs provide input into but don’t directly control. Still, it’s good to know some of the challenges around data retention and figure out how to apply it to your organization.

Comments closed

Cross-Platform SQL Server Availability Groups

Rajendra Gupta shows how to set up an Availability Group in SQL Server which runs on both Windows and Linux:

Microsoft supports SQL Server on Linux, and it has many of the same features as the Windows version. You can restore databases from Windows to Linux SQL or vice versa. The Linux SQL works with Red Hat, Ubuntu, SUSE enterprise, Kubernetes containers, and Docker.

Windows-based SQL instance supports SQL Server Always On Availability Groups for high availability and disaster recovery. If you are not familiar with Windows AG configuration, refer to the extensive series on Always on Availability Group (Toc at the bottom).

If you have both Windows and Linux SQL Server, is it possible to configure an availability group between them? Let’s explore this in this article.

This example uses async mode, which is the easier one to set up. With synchronous, you’re probably looking at using Pacemaker to sort out AG status.


Database-Driven Parameterization for Synapse Pipelines

Paul Hernandez does some configuring:

Particularly in Synapse, there are even no global parameters like in Azure Data Factory. 

When you want to move your development to another environment, typically CI/CDs pipelines are used. These pipelines consume an ARM template together with its parameter file to create a workspace in a target environment. The parameters can be overriding in the CD pipeline as explain here:

Even so, I have not found a proper way to change the values of a pipeline parameter (the same for data flows and datasets parameters). I saw some custom parameters manipulation to set the default value of a parameter and then deploy it without any value, or even JSON manipulation with PowerShell (the dark side for me).

Read on for an alternative solution which does the job well.

Comments closed

Sharing Power BI Content outside the Organization

Mara Pereira wants to share some data:

I am seeing more and more customers trying to use Premium capabilities to create data products that they can incorporate as part of their main product offering. This kind of reporting as a product solution will add a lot more value to their main product, so I can see why this is becoming quite trendy.

However, it became obvious that the current documentation can be a bit overwhelming and confusing at first.

So I decided to compile the process of sharing content outside of your organisation in a blog post. Happy days!

Click through to see how to share within the Power BI Service.

Comments closed