Press "Enter" to skip to content

Curated SQL Posts

Data Retention: Definition and Policy

Joey Jablonski thinks about data retention:

Data retention policies should be defined in a way that they are easy to understand, easy to be implemented programmatically, and should enable engineering teams to operate independently most of the time when working with datasets that are known and already leveraged by the organization. In addition to policy definitions, data governance leaders should ensure changes are part of data literacy plans for training and rollout to ensure awareness across the organization.

This is something that most DBAs provide input into but don’t directly control. Still, it’s good to know some of the challenges around data retention and figure out how to apply it to your organization.

Comments closed

Cross-Platform SQL Server Availability Groups

Rajendra Gupta shows how to set up an Availability Group in SQL Server which runs on both Windows and Linux:

Microsoft supports SQL Server on Linux, and it has many of the same features as the Windows version. You can restore databases from Windows to Linux SQL or vice versa. The Linux SQL works with Red Hat, Ubuntu, SUSE enterprise, Kubernetes containers, and Docker.

Windows-based SQL instance supports SQL Server Always On Availability Groups for high availability and disaster recovery. If you are not familiar with Windows AG configuration, refer to the extensive series on Always on Availability Group (Toc at the bottom).

If you have both Windows and Linux SQL Server, is it possible to configure an availability group between them? Let’s explore this in this article.

This example uses async mode, which is the easier one to set up. With synchronous, you’re probably looking at using Pacemaker to sort out AG status.

2 Comments

Database-Driven Parameterization for Synapse Pipelines

Paul Hernandez does some configuring:

Particularly in Synapse, there are even no global parameters like in Azure Data Factory. 

When you want to move your development to another environment, typically CI/CDs pipelines are used. These pipelines consume an ARM template together with its parameter file to create a workspace in a target environment. The parameters can be overriding in the CD pipeline as explain here: https://techcommunity.microsoft.com/t5/data-architecture-blog/ci-cd-in-azure-synapse-analytics-part-4-the-release-pipeline/ba-p/2034434

Even so, I have not found a proper way to change the values of a pipeline parameter (the same for data flows and datasets parameters). I saw some custom parameters manipulation to set the default value of a parameter and then deploy it without any value, or even JSON manipulation with PowerShell (the dark side for me).

Read on for an alternative solution which does the job well.

Comments closed

Sharing Power BI Content outside the Organization

Mara Pereira wants to share some data:

I am seeing more and more customers trying to use Premium capabilities to create data products that they can incorporate as part of their main product offering. This kind of reporting as a product solution will add a lot more value to their main product, so I can see why this is becoming quite trendy.

However, it became obvious that the current documentation can be a bit overwhelming and confusing at first.

So I decided to compile the process of sharing content outside of your organisation in a blog post. Happy days!

Click through to see how to share within the Power BI Service.

Comments closed

Anomaly Detection over Delta Live Tables

Avinash Sooriyarachchi and Sathish Gangichetty show off an interesting scenario:

Anomaly detection poses several challenges. The first is the data science question of what an ‘anomaly’ looks like. Fortunately, machine learning has powerful tools to learn how to distinguish usual from anomalous patterns from data. In the case of anomaly detection, it is impossible to know what all anomalies look like, so it’s impossible to label a data set for training a machine learning model, even if resources for doing so are available. Thus, unsupervised learning has to be used to detect anomalies, where patterns are learned from unlabelled data.

Even with the perfect unsupervised machine learning model for anomaly detection figured out, in many ways, the real problems have only begun. What is the best way to put this model into production such that each observation is ingested, transformed and finally scored with the model, as soon as the data arrives from the source system? That too, in a near real-time manner or at short intervals, e.g. every 5-10 minutes? This involves building a sophisticated extract, load, and transform (ELT) pipeline and integrating it with an unsupervised machine learning model that can correctly identify anomalous records. Also, this end-to-end pipeline has to be production-grade, always running while ensuring data quality from ingestion to model inference, and the underlying infrastructure has to be maintained.

Click through to see their solution using Databricks and delta lake.

Comments closed

Power BI Desktop August 2022 Updates

Matt Allington looks at some recent updates to Power BI:

I’ve been pretty busy over the last few months. The demand for Power BI skills has never been stronger, and my company is super busy. I haven’t written a blog article for a while, but I wanted to take a bit of time out this morning to talk about the August 2022 update to Power BI Desktop. As Power BI matures, there is less and less to get excited about with a new release of Desktop, but there were a couple of things that caught my eye in this release, worthy of calling out.

Read on for a couple of quality of life improvements.

Comments closed

Finding Guidance on Power BI

James Serra puts together a compendium:

Recently there has been a number of great articles published on Power BI that I wanted to make you aware of that go beyond the features descriptions found in the Power BI documentation. These new articles fall under the Power BI guidance documentation and are designed to address common strategic patterns.  Below is my summary of the articles, and check out Power BI guidance from the CAT by Matthew Roach for a more detailed summary.

If you’re interested in Power BI administration and strategic deployment, there’s a lot of good information here.

Comments closed