Cloud – Page 58 – Curated SQL

Today I would like to share a scenario that I was working on one of my serverless SQL Pool support cases. The customer asked for an advice on how to monitor serverless SQL requests by using log analytics.
The intention of this guide is to help you with choosing the configuration required to easily setup the Synapse Analytics Workspace monitoring and all other considerations about how to monitor serverless SQL requests with Azure Monitor. Spoiler: At the end of this article, I will share the latest version of the serverless workbook posted on the Azure_Synapse_Tool_Box. This includes a really cool way to see query execution information.

Read on for that and definitely check out the Azure Synapse Toolbox if you’re a Synapse user.

Comments closed

Pattern Learning in Amazon SageMaker

Published 2022-10-25 by Kevin Feasel

Vishaal Kapoor, et al, take us through an example of pattern learning in Amazon SageMaker:

Pattern learning automatically analyzes your data and surfaces textual constraints that may apply to your dataset. For the example with phone numbers, pattern learning can analyze the data and identify that the vast majority of phone numbers follow the textual constraint [1-9][0-9]{2}-[0-9][4]. It can also alert you that there are examples of invalid data so that you can exclude or correct them.
In the following sections, we demonstrate how to use pattern learning in Data Wrangler using a fictional dataset of product categories and SKU (stock keeping unit) codes.

Read on for the scenario.

Comments closed

Performance Dashboard vs Azure Portal for CPU Usage

Published 2022-10-25 by Kevin Feasel

Yochanan Rachamim explains why you might see different information between tools:

When using Performance Dashboard on Azure SQL Managed Instance you may find that percent of CPU usage is different (lower) than reported by Azure Portal for your Managed Instance.
You may wonder which is the correct value.

Click through for the answer.

Comments closed

Transferring Data between Dedicated SQL and Spark Pools in Synapse

Published 2022-10-24 by Kevin Feasel

Sidney Cirqueria shows off a connector available to us in Azure Synapse Analytics:

Usually, customers do this kind of operation using Synapse Apache Spark to load data to Dedicated Pool within Azure Synapse Workspace, but today, I would like to reproduce a different scenario that I was working on one of my support cases. Consider a scenario where you are trying to load data from Synapse Spark to Dedicated pool (formerly SQL DW) using Synapse Pipelines, and additionally you are using Synapse Workspace deployed with Managed Virtual Network.
The intention of this guide is to help you with which configuration will be required if you need to load data from Azure Synapse Apache Spark to Dedicated SQL Pool (formerly SQL DW). If you prefer take advantage of the new feature-rich capabilities now available via the Synapse workspace and Studio and load data directly from Azure Apache Spark to Dedicated Pool in Azure Synapse Workspace is recommended that you enable Synapse workspace features on an existing dedicated SQL pool (formerly SQL DW).

Read on for a few tips a nd a step-by-step walkthrough of the process.

Comments closed

Azure SQL DB GP Compute Optimized Performance Comp

Published 2022-10-24 by Kevin Feasel

Reitse Eskens continues a series on comparing the performance of different Azure SQL DB tiers:

The compute optimized tier starts at 8 cores as a minimum and goes up to 72.
The maximum storage starts at 1 TB and goes up to 4 TB. The number of TempDB files is undisclosed but the size starts at 37 GB and goes up to 333 GB. A huge difference with the regular provisioned one! The disk limitations are disclosed as well. Log rate starts at 36 Mbps and maxing out at 50. The same goes for the data; starting at 2560 iops and maxing out at 12.800. One iop (or Input Output oPeration) is connected to the disk cluster size. As these are 4 kb, reading or writing one data page (8Kb) equals two iops. 2560 Iops equals something of 10 MB per second. The top end goes to about 50 MB per second. Keep this in mind when you’re working out which tier you need, because usually disk performance can be essential.

Read on to see how it performs on Reitse’s standardized test workload.

Comments closed

Granular Billing for Azure Data Factory

Published 2022-10-19 by Kevin Feasel

Chenye Charlie Zhu announces a new feature:

By default, Azure Data Factory reports lump sum charges for billing, meaning that at the factory level, we add up charges across all pipelines within a factory, and tell you how much you have spent on these pipelines. In many cases, these aggregate numbers should suffice. But in others, these numbers lack the clarity and transparency that we thrive to provide customers. For instance, if you are running data pipelines for multiple teams, you may want to determine the cost for each pipeline, for proper book-keeping and/or charge backs.
Now, Azure Data Factory will help you with this endeavor, with built-in per pipeline detailed billing view. Moreover, we built the feature on top of the Azure Billing and Cost Analysis platform, allowing you to stay with the cost and budget management tool that you are familiar with to identify spending trends and spot where overspending might have occurred.

Great if you have half a dozen pipelines. Probably less great if you have 500.

Comments closed

Geo-Zone Redundant Storage for SQL MI Backups

Published 2022-10-18 by Kevin Feasel

Niko Neugebauer moves the backups pretty far away:

The new Geo-Zone Redundant Storage (GZRS) backup storage option combines the best of two worlds – Geo-Redundant and Zone-Redundant storage, keeping backups safe from both regional (Geo-Redundant) and Data Center (Zone-Redundant) failures. It provides the highest availability for storage currently offered on Azure, improving recovery speed and enabling Point-In-Time Restore (PITR) of backups in the event of a zone failure.
Geo-Zone Redundant Storage for Azure SQL Managed Instance backups provides 3 synchronous copies in different availability zones within the same primary region, plus an additional asynchronous copy within a single availability zone in the paired secondary region, as shown on the following picture:

Click through for that picture and what it does for expected availability. Basically, a whole bunch of data centers would need to fail before you lose a backup. Or someone messes up DNS and makes everything unavailable for a day, not that that’s ever happened before with a large cloud service provider…

Comments closed

Running Jobs in Azure SQL Database

Published 2022-10-17 by Kevin Feasel

Etienne Lopes gets a job:

In SQL Server, many automated tasks are implemented via SQL Server Agent jobs. If you’re used to it, you may get a bit sad when you learn that SQL Server Agent is not present in Azure SQL Database. However if those automated tasks rely on T-SQL then they can still be easily implemented in Azure SQL Databases using elastic jobs that provide the ability to run T-SQL scripts on a schedule or on-demand

Read on to see how they work.

Comments closed

General Purpose Tier Azure SQL DB Performance

Published 2022-10-17 by Kevin Feasel

Reitse Eskens continues a series on comparing tiers of Azure SQL Database:

In my previous blog, I wrote about the serverless tier, the one that can go to sleep if you’re not using it for more than one hour (minimum). That tier is cheaper as long as you’re not running it for more than 25% of the time. If you need more time, go provisioned.
Another difference between serverless and provisioned is that the provisioned one gets a set number of cores whereas the serverless one has a minimum and a maximum number of cores. So this time, the blog is about the provisioned tier where you choose a fixed number of CPU’s with a fixed monthly cost.

Click through for the analysis. I’ll reiterate here that I really hope Reitse has some graphics at the end (or at least tables) which sort of lay out where the boundaries between tiers are and what the performance and cost profiles look like between them.

Comments closed

Monitoring Azure SQL Backup History

Published 2022-10-14 by Kevin Feasel

Mustafa Ashour wants you to check your backups:

Database backups are an essential part of any business continuity and disaster recovery strategy, because they help protect your data from corruption or deletion. These backups enable database restore to a point in time within the configured retention period. By default, Azure SQL Database & Azure SQL Managed Instance stores data in geo-redundant storage blobs that are replicated to a paired region. Geo-redundancy helps protect against outages that affect backup storage in the primary region. It also allows you to restore your databases/instance to a different region in the event of a regional outage/disaster.

Read on to learn more about how Azure SQL DB and Azure SQL Managed Instance perform backups, their cadence, and how you can find information on backup history.

Comments closed

Category: Cloud

Monitoring the Serverless SQL Pool via Log Analytics

Pattern Learning in Amazon SageMaker

Performance Dashboard vs Azure Portal for CPU Usage

Transferring Data between Dedicated SQL and Spark Pools in Synapse

Azure SQL DB GP Compute Optimized Performance Comp

Granular Billing for Azure Data Factory

Geo-Zone Redundant Storage for SQL MI Backups

Running Jobs in Azure SQL Database

General Purpose Tier Azure SQL DB Performance

Monitoring Azure SQL Backup History