Press "Enter" to skip to content

Category: Cloud

Restoring SQL Managed Instance Backups to SQL Server 2022

Mladen Andzic has a preview around how we can take a Managed Instance backup and go on-premises:

Restoring a backup file is the easiest way to copy a SQL Server database to another instance. It allows you to create a copy of your production database for easier troubleshooting or debugging of an issue, to provide a copy of a database to your end users or eligible third parties, or as a light-weight business continuity/disaster recovery solution to restore functionality on another instance of SQL Server. These are just a few use cases, and the list is much longer and there are some very inventive ways of using backup-restore in the wild.

This article explains the challenges of cross-release restore to an older version of SQL engine and announces the private preview of a capability to restore a backup of a database taken from Azure SQL Managed Instance to instance of SQL Server 2022. 

So much SQL Server functionality has been built with the idea of getting you from on-premises into the cloud (specifically Azure) but it’s good to see them spend some development effort on the entirely reasonable and realistic scenario that Azure is not the best choice for a company and there are many such companies still willing to throw money at Microsoft for a good product.

Comments closed

Software Engineering Practices for Notebooks

Rafi Kurlansik and Austin Ford explain how to get the most out of notebooks, using Databricks as an example:

Notebooks are a popular way to start working with data quickly without configuring a complicated environment. Notebook authors can quickly go from interactive analysis to sharing a collaborative workflow, mixing explanatory text with code. Often, notebooks that begin as exploration evolve into production artifacts. For example,

1. A report that runs regularly based on newer data and evolving business logic.

2. An ETL pipeline that needs to run on a regular schedule, or continuously.

3. A machine learning model that must be re-trained when new data arrives.

Perhaps surprisingly, many Databricks customers find that with small adjustments, notebooks can be packaged into production assets, and integrated with best practices such as code review, testing, modularity, continuous integration, and versioned deployment.

Read on for several tips and recommendations.

Comments closed

Stopping Azure Kubernetes Service Nodes

Andrew Pruski wants to shut the whole thing down:

A while back I wrote a post on Adjusting Pod Eviction Timings in Kubernetes. To test the changes made in that post I had to shut down nodes in an Azure Kubernetes Service cluster.

This can be done easily in the Azure portal: –

However I did a presentation recently and didn’t want to have to keep jumping into the portal from VS Code…so I wanted to be able to shut down the nodes in code.

So here’s how to use the azure-cli to shut down a node in an Azure Kubernetes Service cluster.

Read on to see how but also read Andrew’s warning / disclaimer so you don’t mess anything up in a production environment.

Comments closed

PHI De-Identification in Databricks with NLP

Amir Kermany, et al, share a set of notebooks:

John Snow Labs, the leader in Healthcare natural language processing (NLP), and Databricks are working together to help organizations process and analyze their text data at scale with a series of Solution Accelerator notebook templates for common NLP use cases. You can learn more about our partnership in our previous blog, Applying Natural Language Processing to Health Text at Scale.

To help organizations automate the removal of sensitive patient information, we built a joint Solution Accelerator for PHI removal that builds on top of the Databricks Lakehouse for Healthcare and Life Sciences. John Snow Labs provides two commercial extensions on top of the open-source Spark NLP library — both of which are useful for de-identification and anonymization tasks — that are used in this Accelerator:

This is a really interesting scenario.

Comments closed

Connecting to Azure SQL DB over VPN

Reitse Eskens has some routing issues:

To make sure the on-premises connection uses the VPN and the private endpoint, we need to make sure the on-premises DNS (it’s always DNS) recognizes the traffic and redirects it to the VPN connection. But whatever we tried on the firewall, the traffic kept going the wrong way. It did have something to do with the on-premises DNS setup in the end.

When we tried to connect to the Azure SQL instance on IP-address, it threw an error because the instance wasn’t found. You can only connect to it with the FQDN (dbname.database.windows.net)

Click through to see what the problem was and how Reitse solved it.

Comments closed

Git Native Support for Databricks Workflows

Vaibhav Sethi and Roland Faeustlin make an announcement:

We are happy to announce native support for Git in Databricks Workflows, which enables our customers to build reliable production data and ML workflows using modern software engineering best practices. Customers can now use a remote Git reference as the source for tasks that make up a Databricks Workflow, for example, a notebook from the main branch of a repository on GitHub can be used in a notebook task. By using Git as the source of truth, customers eliminate the risk of accidental edits to production code. They also remove the overhead of maintaining a production copy of the code in Databricks and keeping it updated, and improve reproducibility as each job run is tied to a commit hash. Git support for Workflows is available in Public Preview and works with a wide range of Databricks supported Git providers including GitHub, Gitlab, Bitbucket, Azure Devops and AWS CodeCommit.

Read on to see how it works.

Comments closed

Delta Live Tables and Power BI Data Modeling

Tahir Fayyaz goes from Delta Lake to Power BI:

To get the optimal performance from Power BI it is recommended to use a star schema data model and to make use of user-defined aggregated tables. However, as you build out your facts, dimensions, and aggregation tables and views in Delta Lake, ready to be used by the Power BI data model, it can become complicated to manage all the pipelines, dependencies, and data quality as you need to consider the following:

– How to easily develop and manage the data model’s transformation code.

– How to run and scale data pipelines for the model as data volumes grow.

– How to keep all the Delta Lake tables updated as new data arrives.

– How to view the lineage for all tables as the model gets more complex.

– How to actively stop data quality issues that result in incorrect reports.

Read on for recommendations, a couple architectural diagrams, and some sample code.

Comments closed

Refreshing SQL Managed Instances which Use TDE

Bradley Ball keeps the dev environment up to date:

Hello Dear Reader!  I was working with some friends lately and we needed to set up a process to refresh their Development Environment databases from Production.  Additionally, the databases are encrypted using Transparent Data Encryption, The SQL MI instances are in different regions, and the SQL MI Instances are in different subscriptions.  To duplicate the environment, in order to match our friends, we did the following setup.

Click through for a high-level overview, step-by-step guidance, and a whole lot of detail.

Comments closed

Extracting Data from DAX Measures into CSV

Gilbert Quevauvilliers builds a process:

In this blog post I am going to demonstrate how to use the new Power Automate Flow to extract data from a DAX measure into a SharePoint CSV file.

I got this idea after reading the blog post from the Microsoft Power BI Team: Unlocking new self-service BI scenarios with ExecuteQueries support in Power Automate | Microsoft Power BI Blog | Microsoft Power BI

The great news is that this works on Power BI Pro, Premium Per User and Premium.

Read on to see how.

Comments closed

Point-in-Time Recovery with Azure SQL DB and Managed Instances

Ahmed Mahmoud looks at point-in-time recovery and answers some frequently asked questions:

On some occasions, after the failover is initiated, the current Primary DR will start a new backup chain from that point and old backups are available on the current secondary DR. If we want to restore the backups which exists in Secondary it will not allow us to perform, apparently restore cannot be initiated on the Primary as the backup is not available.

Also, sometimes we observe in secondary DR for few databases, PitR restore point is available and for few databases it shows “no restore point available”    

Read on to understand why that happens and what you can do about it.

Comments closed