Press "Enter" to skip to content

Category: Cloud

Azure Data Explorer September 2022 Updates

Shaf Mahmood has a few updates for us:

The  optimized autoscale feature has been in place and using reactive logic. It has been helping ADX users by adjusting the cluster size when there is an increase in resources due to ingestion or query load.  The optimized autoscale feature has been further improved with predictive logic. This logic monitors the same metrics as the reactive logic and over time builds up the cluster usage pattern and uses this to forecast and plan the size of the cluster.  The reactive logic is still used to ensure any forecast anomalies or usage pattern changes are still autoscaled appropriately.

There are a few cost-related updates but also updates to visuals, data ingestion, and more.

Comments closed

Basic Tier Performance in Azure SQL DB

Reitse Eskens keeps things basic:

When you look at the documentation, the basic tier has low CPU (at max less than one), 1 to 4 IOPS per DTU (translating to 5-20 IOPS in total), a latency of 5ms read and 10 ms write and a maximum of 7 days backup retention. Even though it’s advertised as a production database, I wouldn’t store essential data there. However, I can use it for metadata storage in a Data Factory or Synapse Analytics environment.

About the IOPS, one IOP is a read of a disk cluster, usually 4 Kb. In this case, this means that the database is capable of reading 20 to 80 Kb of data per second.

When they say Basic, they aren’t kidding.

Comments closed

Restoring Azure SQL MI Databases to SQL Server 2022

Garry Bargsley restores a backup:

My post for T-SQL Tuesday #154 is going to be a demo of the new SQL Server feature that allows you to restore an Azure SQL Managed Instance backup to a SQL Server 2022 instance. Actually, I am not sure if this is a feature or just an enhancement that allows for this behavior to work.

Current versions of SQL Server do not allow the restoration of backups taken on Azure SQL Managed Instance.  Managed Instances are considered “vNext” and runs a different version that is beyond anything on-premises could keep up with due to the frequent changes being applied to Managed Instance. There has always been a disconnect with the restorability between the two platforms, so it is good that Microsoft has found a solution to this limitation.

I’d still like to see the ability to fail from SQL MI to on-prem (or VM) SQL Server—you can go from on-prem to Azure SQL MI, though it sounds like right now, failback is a database restore.

Comments closed

Data Virtualization in Azure SQL Managed Instance

Mladen Andzic has an announcement:

We are excited to announce the general availability (GA) of data virtualization capabilities in Azure SQL Managed Instance, with improved query performance and managed identity as a new supported option for authenticating to storage accounts.  

The data virtualization enables you to execute Transact-SQL (T-SQL) queries on files storing data in common data formats in Azure Data Lake Storage Gen2 or Azure Blob Storage and combine it with relational data stored locally in the managed instance using logical joins. This way you can transparently access external data while keeping it in its original format and location. There is no data duplication or need to run and maintain ETL processes, which means that you can extract and deliver insights faster. The supported file formats are Parquet, CSV, and JSON.

This is similar to PolyBase in SQL Server 2019 but is a different underlying technology. In SQL Managed Instance, it looks like we only get API-based data virtualization, not the ODBC-based PolyBase we saw in SQL Server 2019.

Comments closed

SQL Server 2022 and Azure Data Lake Storage

I continue a series on data virtualization in SQL Server 2022:

Today’s post is a fairly short one. Let’s say you want to connect to Azure Data Lake Storage Gen2 from SQL Server 2022. We saw in a recent post how to connect to Azure Blob Storage. When it comes to Azure Data Lake Storage, the story is almost the same, though there are a couple pitfalls you will want to avoid.

Read on for that error, which stymied me for a good 10 minutes.

Comments closed

Comparing Performance of Azure SQL Databases

Reitse Eskens begins an interesting series:

The trigger for this feast were a number of questions from my employer. What database tier and SKU (Stock Keeping Unit) should we choose for customer C. Well, the answer depends on their budget and the performance they’re expecting. And we didn’t have a conclusive answer. Microsoft will provide you with their statistics on database performance, but what does that mean in real life. Because no matter what the good people in Redmond can come up with as databases, we as data engineers can do worse. Much worse.

So, I created some horrible query’s to insert, select and delete data and fired them off to a database. My own laptop with nvme disks provided the baseline, after that I gradually worked my way up from the Basic DTU database to the Hyperscale database.

This first post is an introduction to the method and laying out expectations. I’m looking forward to the remaining posts in the series.

Comments closed

Deploying an Arc-Enabled SQL Managed Instance

Warwick Rudd continues a series on Azure Arc-enabled data services:

Now that we have our Azure Arc-enabled Data Controller configured and available, we can now deploy our first Arc-enabled SQL Managed Instance into our environment. As previously mentioned depending on the type of configuration required for your environment with your Arc-enabled Data Controller (Directly connected or Indirectly connected modes) this will dictate the approach available for you to setup / configure your Arc-enabled SQL Managed Instance.

Click through for a step-by-step guide.

Comments closed

Inserting into Azure Blob Storage from SQL Server 2022

I continue a series on data virtualization in SQL Server 2022:

Several years ago, I wrote a blog post on how to insert data into Azure Blob Storage from SQL Server using PolyBase. That technique used PolyBase V1: the Java connector for Hadoop. With SQL Server 2022 eliminating that connector, we’re going to learn the new method.

This is one of the larger practical differences in data virtualization with SQL Server 2022.

Comments closed

Deploying a Streamlit App to RStudio Connect

Parisa Gregg wraps up a series:

RStudio Connect is a platform which is well known for providing the ability to deploy and share R applications such as Shiny apps and Plumber APIs as well as plots, models and R Markdown reports. However, despite the name, it is not just for R developers (hence their recent announcement). RStudio Connect also supports a growing number of Python applications, API services including Flask and FastAPI and interactive web based apps such as Bokeh and Streamlit.

In this post we will look at how to deploy a Streamlit application to RStudio Connect. Streamlit is a framework for creating interactive web apps for data visualisation in Python. It’s API makes it very easy and quick to display data and create interactive widgets from just a regular Python script.

Click through for the step-by-step process.

Comments closed

Backup Options for Cosmos DB

Manvendra Singh takes a backup:

This article will explore backup options available in the Azure Cosmos DB service. Backups are very important to safeguard our data in case of data corruption, data deletion, system failure, or any unforeseen circumstances like DR. We have planned, configured, and managed it for our on-prem databases whether it is SQL Server, Oracle, DB2, or system files on various machines. DBAs and Infrastructure admins have ensured to keep a backup of all these systems to safeguard their data. Similarly, we must also secure our data hosted in a cloud environment for any services whether it is Azure VMs, Azure SQL, Cosmos Db accounts, or any other services. Today we will talk about backup options available to secure cosmos DB databases and their contents.

Click through for those two options.

Comments closed