Category: Cloud

PolyBase and Cosmos DB’s Core API

Published 2022-09-01 by Kevin Feasel

I have some fun integrating the Cosmos DB Core API with PolyBase:

PolyBase comes with a few built-in drivers, including Oracle, Teradata, MongoDB, and SQL Server. For everything else in the 2019 “style” of things, there is a generic ODBC route. In this route, you need to obtain a valid ODBC driver, configure it, and let PolyBase know how to access data from that remote source.
Cosmos DB’s Core API just happens to have a working ODBC driver, so the first step is to grab the relevant version of that driver and install it on the machine running SQL Server.

Read on to see how it works and how you can get around some initial pain points. As a quick note, this only works with SQL Server on Windows, as SQL Server on Linux does not support generic ODBC drivers with PolyBase.

Comments closed

Azure Data Explorer UI Updates

Published 2022-09-01 by Kevin Feasel

Michal Bar has a couple of posts for us. First, updates to the desktop app Kusto Explorer:

Query Automation allows you to define a workflow that contains a series of queries with rules and logic that govern the order in which they are executed. Automations can be reused, and users can re-run the workflow, to get updated results. Upon completion, the saved Automation produces an analysis report, summarizing all queries results with additional insights.

Then, updates to the ADX web explorer:

It is now possible to embed Azure Data Explorer dashboards in 3^rd party apps. This comes on top of allowing embedding of the Monaco editor in 3^rd party apps.
Dashboard embedding allows you to easily share data with your customers in a way that allows them to interact and explore it.
Using the various feature flags, you can control the exact controls that will be part of the embedded dashboard experience. For example, you can decide to remove the share, and add connection menu items or others.
To learn more about dashboard embedding, please read this doc Embed dashboards

Read on for the full changelog.

Comments closed

PolyBase in SQL Server 2022: Cosmos DB via MongoDB API

Published 2022-08-31 by Kevin Feasel

I have gotten back on the data virtualization wagon:

Back in the 2019 days, I noted a problem when CU2 of SQL Server 2019 came out. This is because the Cosmos DB collection I was using reported a wire version of 2 rather than the minimum version of 3. The official fix at that time was to create a new collection using the then-latest version of 3.6 but that didn’t work for me. My workaround was to use the old MongoDB drivers that shipped with SQL Server 2019 RTM.
Well, as of 2022, that solution won’t work anymore. The original MongoDB drivers don’t ship with SQL Server 2022, so we can’t use that workaround. I had a Cosmos DB account that was originally built on version 3.6. Even after upgrading to server version 4.2, it still reported wire version 2 when I connected to the endpoint that was relevant 3 years ago. Therein lies the solution to the problem.

It turns out there are two viable solutions now and I show both of them.

Comments closed

Ingestion from S3 into Azure Data Explorer

Published 2022-08-29 by Kevin Feasel

Anshul Sharma announces another source for Azure Data Explorer:

Today we are excited to launch the ability to ingest data from Amazon Simple Storage Service (S3) into Azure Data Explorer (ADX) natively.
Amazon S3 is one of the most popular object storage services. AWS Customers use Amazon S3 to store data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, applications, IoT devices, log analytics and big data analytics.
Azure Data Explorer (ADX) is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real time. ADX supports ingesting data from a wide variety of sources such as Azure Blob, ADLS gen2, Azure Event Hub, Azure IoT Hub, and with popular open-source technologies such as Kafka, Logstash, Telegraph. With the new S3 support, customers can bring data from S3 natively without relying on complex ETL pipelines.

Between this, ADF/Synapse pipelines, and SQL Server 2022, it seems that Microsoft got the message that people do use multiple clouds and do want to read AWS data in Azure. Which is good because that directly benefits me…

Comments closed

Overly Large Executors in ElasticMapReduce

Published 2022-08-29 by Kevin Feasel

Dmitry Tolpeko notes a change to Amazon ElasticMapReduce:

So 50 executors were initially requested with the required memory 22528 and 4 vcores as expected, but actually 9 executors were created with 112640 memory and 20 cores that is 5x larger. It should have created 10 executors but my cluster does not have resources to run more containers.
Note: The second log row specifies allocated vCores:5, it is because of using DefaultResourceCalculator in my YARN cluster that ignores CPU and uses memory resource only. Do not pay attention to this, the Spark executor will still use 20 cores as it reported in the third log record above.

Click through for the reason.

Comments closed

Log Truncation Improvements in SQL Managed Instance Backup Restorations

Published 2022-08-29 by Kevin Feasel

Niko Neugebauer reports on some performance improvements:

There are multiple phases of the SQL MI database restore process, which, aside from the usual SQL Server restore operations, includes registering databases as an Azure asset for visibility & management within the Azure Portal. Additionally, the restore process forces a number of specific internal options, and some property changes such as forcing the switch to the FULL recovery model and forcing the database option PAGE_VERIFY to CHECKSUM, as well as eventually performing a full backup to start the log chain and provide full point-in-time- restore options through the combination of full and log database backups.
The restore operation on SQL MI also includes log truncation, and the execution time for the truncation has been vastly improved, which means that customers can expect their entire database restore process to become faster on both service tiers.

Click through to see what kind of performance gains you can expect.

Comments closed

Automating Bring-Your-Own-Key Rotation for TDE in Azure SQL DB

Published 2022-08-26 by Kevin Feasel

Shoham Dasgupta announces a new preview program:

Transparent data encryption (TDE) in Azure SQL Database and Managed Instance helps protect against the threat of malicious offline activity by encrypting data at rest.  TDE with Customer-Managed Key (CMK) enables Bring Your Own Key (BYOK) scenario for data protection at rest, by allowing a key stored in a customer-owned and customer-managed Azure Key Vault to be used as the TDE Protector on the server or managed instance.
When using TDE with Customer-Managed Key, one of the important responsibilities that customers need to perform on a regular basis is key rotation, that is, rotating the TDE Protector on the server by switching to a new key (or new version of the earlier key) from Azure Key Vault. Key rotation is a critical activity for an organization that is required to meet security and compliance objectives.
Automated key rotation for Azure SQL Database and Managed Instance is now available in preview, simplifying key management responsibilities for customers.

Click through to see how this works.

Comments closed

Max Server Memory and AWS

Published 2022-08-26 by Kevin Feasel

Andrea Allred runs into a weird issue on AWS RDS:

We tracked down every job that was touching the server and started to tune it, thinking that was just pushing us over the edge. We worked with AWS and finally one of our engineers noticed that our MAX Server Memory Setting was back at the SQL default. You know that insane default? Yes, it was there. But we had properly set that…three months ago when this stack was put in place.

Click through for the entire story, including symptoms and resolution.

Comments closed

Overview of Arc-Enabled SQL Managed Instances

Published 2022-08-25 by Kevin Feasel

Warwick Rudd continues an overview of Azure Arc-Enabled Data Services:

In our previous post, we mentioned the 2 types of data services that are supported and able to be managed by our newly deployed Data Controller:
– Azure Arc-enabled SQL Managed Instance
– Azure Arc-enabled PostgreSQL Hyperscale
In this pose we are going to have a look at the differences between an installation of Azure SQL Managed Instance and Azure Arc-enabled SQL Managed Instance.

This post doesn’t cover the actual deployment; Warwick promises that in his next post.

Comments closed

Azure Synapse Link for SQL Server 2022 and File Analysis

Published 2022-08-23 by Kevin Feasel

Kevin Chant digs into Azure Synapse Link for SQL Server 2022:

In this post I want to cover some file tests for Azure Synapse Link for SQL Server 2022 that I performed.
Because a while back I spotted something interesting whilst I was doing some initial tests for Azure Synapse Link for SQL Server 2022.
Which is when you add new data after the initial load that a new folder called ‘ChangeData’ appears in the storage account container. I noticed that the new file containing the insert was a comma separated value (csv) file. Whereas the table used for the initial load was a parquet file.

Is there a method to this madness? Click through to see Kevin’s tell-all story.

Comments closed