Cloud – Page 94 – Curated SQL

As Synapse engineer or Synapse Support Engineer you may need to start and test some Pools, and you want this to be the most cost efficient possible. Leaving some Synapse with a lot of DWU left turned on during the weekend because you forget to pause the DW after you shutdown your computers is not a good approach and we can quickly resolve this by using Powershell + Automation accounts.

This is also a good introduction to Azure Automation if you aren’t familiar with it.

Comments closed

Saving Money on Backups to Azure Blob Storage

Published 2021-01-15 by Kevin Feasel

John McCormack has a few tips for saving some cash:

You have 5 databases on a SQL Server Instance. You take daily full backups of each database on your instance. You also take log backups every 15 minutes as each database is in full recovery mode. This means in 1 week, you will have 35 full backups and 3,360 transaction log backups. This multiplies to 1,820 full and 174,720 t-log backups over 52 weeks. Multiply this for 7 years or more and the costs can get very expensive.

Click through to see how you can save a considerable amount with a bit of planning.

Comments closed

Updates to AzureR

Published 2021-01-14 by Kevin Feasel

Hong Ooi has some updates for us:

This is an update on what’s been happening with the AzureR suite of packages. First, you may have noticed that just before the holiday season, the packages were updated on CRAN to change their maintainer email to a non-Microsoft address. This is because I’ve left Microsoft for a role at Westpac bank here in Australia; while I’m sad to be leaving, I do intend to continue maintaining and updating the packages.
To that end, here are the changes that have recently been submitted to CRAN, or will be shortly:

Read on for the changes. This includes a new package to work with Cosmos DB from R.

Comments closed

Little Things in Azure Data Factory

Published 2021-01-13 by Kevin Feasel

Rayis Imayev has some kind words about small niceties in Azure Data Factory:

Recently Microsoft team conducted a brief year-end survey about a “one thing” that Azure Data Factory (ADF) “made your day in 2020” – https://twitter.com/weehyong/status/1343709921104183296. There were different responses from the global parameters support to the limit increase of ADF instances per subscription.
I personally like the little things that are not easily detected on a surface, but with a deeper immersion into a data pipeline development, your level of gratefulness increases even more.

Click through for a few examples.

Comments closed

Azure Data Factory and Source Control

Published 2021-01-12 by Kevin Feasel

Ahmad Yaseen shows how you can save Azure Data Factory pipelines in source control:

To overcome these limitations, Azure Data Factory provides us with the ability to integrate with a GIT repository, such as Azure DevOps or GitHub repository, that helps in tracking and versioning the pipelines changes, and incrementally save the pipeline changes during the development stage, without the need to validate the incomplete pipeline, preventing these changes from being lost in case of any crash or failure. In this case, you will be able to test the pipeline, revert any change that is detected as a bug, and publish the pipeline to the Data Factory when everything is developed and validated successfully.

Click through for the setup instructions.

Comments closed

Running an mlflow Server on Azure

Published 2021-01-08 by Kevin Feasel

Paul Hernandez configures mlflow on Azure using platform-as-a-service offerings:

It is indisputable true that mlflow came to make life a lot easier not only for data scientists but also for data engineers, architects among others. There is a very helpful list of tutorials and example in the official mlflow docs. You can just download it, open a console and start using it locally on your computer. This is the fastest way to getting started. However, as soon as you progress and introduce mlflow in your team, or you want to use it extensively for yourself, some components should be deployed outside your laptop.
To exercise a deployment setup and since I own azure experience, I decided to provision a couple of resources in the cloud to deploy the model registry and store the data produced by the tracking server.

I concur on the power of mlflow.

Comments closed

Transforming Arrays in Azure Data Factory

Published 2021-01-08 by Kevin Feasel

Mark Kromer shows off a few functions in Azure Data Factory to modify data in arrays:

The first transformation function is map() and allows you to apply data flow scalar functions as the 2nd parameter to the map() function. In my case, I use upper() to uppercase every element in my string array: map(columnNames(),upper(#item))

Read on for more iteration and aggregation functions akin to map, reduce, and filter.

Comments closed

Cross-Validation in Azure ML Studio

Published 2021-01-05 by Kevin Feasel

Dinesh Asanka takes us through the cross-validation component in Azure ML Studio:

Let us look at implementing Cross-Validation in Azure Machine Learning. Let us use the sample Adventure Works database that we used for all the articles.
Then Cross Validate Model is dragged and dropped to the experiment. The Cross Validate model has two inputs and two outputs. Two inputs are data input and the relation to the Machine Learning technique. Let us use the Two-Class Decision Jungle as the Machine Learning Technique. Then the first output is connected to the Evaluate Model as shown in the following figure:

Click through for the process.

Comments closed

Wrapping up the Azure Databricks Advent

Published 2021-01-04 by Kevin Feasel

Tomaz Kastrun laughs at 24-day advent calendars:

In the last two days we have focused on understanding Apache Spark through performance tuning and through troubleshooting. Both require some deeper understanding of Spark and Azure Databricks, but gives also a great insight to all who will need to improve performance and work with Spark.
Today, I would like to list couple of additional Learning material, documentation and any other additional resources for further exploration on Azure Databricks.

Click through for links to additional resources on Apache Spark and Databricks, as well as the other 30 entries in the series.

Comments closed

Inlining KQL in Power Query

Published 2020-12-29 by Kevin Feasel

Chris Webb shows you how you can include KQL query fragments in Power Query:

If the title wasn’t enough to warn you, this post is only going to be of interest to M ultra-geeks and people using Power BI with Azure Data Explorer – and I know there aren’t many people in either group. However I thought the feature I’m going to show you in this post is so cool I couldn’t resist blogging about it.

Limited in its utility, but still quite interesting.

Comments closed

Category: Cloud

Auto-Pausing Dedicated SQL Pools in Azure Synapse Analytics

Saving Money on Backups to Azure Blob Storage

Updates to AzureR

Little Things in Azure Data Factory

Azure Data Factory and Source Control

Running an mlflow Server on Azure

Transforming Arrays in Azure Data Factory

Cross-Validation in Azure ML Studio

Wrapping up the Azure Databricks Advent

Inlining KQL in Power Query