Press "Enter" to skip to content

Category: Cloud

Azure SQL Updates for May 2023

Anna Hoffman gives us the latest news:

Let’s start with Azure SQL Managed Instance, which had several general availability (GA) announcements in April. First, the GA of Link feature for Azure SQL Managed Instance for SQL Server 2016 and 2019 happened. This capability allows you to set up near real-time replication between a SQL Server and SQL MI. You can use this link for scale, migration, read-only workloads, etc. To learn more, review the announcement blog. The team also announced the GA of CETAS. This stands for Create External Table As Select, which essentially means you can create an external table while in parallel exporting the results of a SELECT statement. This has been a customer ask and you can learn how to take advantage of it here.

Read on to learn more about what’s new with the rest of the Azure SQL landscape, and some things happening in the community.

Comments closed

WaitTime in Power BI

Chris Webb explains what a new metric means:

What does WaitTime represent? Here’s the technical explanation: it’s the wait time on the query thread pool in the Analysis Services engine before the query starts to run. But what does this mean for you as someone trying to tune DAX queries in Power BI?

Chris provides an examplation of exactly that. This tracking of noisy neighbors is interesting, as it would provide insight if you’re noticing variance in dataset refresh times.

Comments closed

Performing a Cloud Adoption Security Review

Daniel Margetic takes a look:

Security is an ongoing journey of incremental progress and maturity, and not a static destination. The Cloud Adoption Framework provides security guidance for this journey by providing clarity to the processes and best practices. This guidance is based on real world experiences of our customers, of Microsoft’s own security journey and lessons learned, and the work with other organizations like NIST (National Institute of Standards and Technology) or CIS (Center for Internet Security).

The outcome is manifested in the Cloud Adoption Framework Secure Methodology which provides a vision of the complete end state of your security journey and follows the Zero Trust principle (assume breachverify explicitlyuse least privilege access).

This assessment gives you the opportunity to self-assess your security journey of your cloud adoption against this secure methodology.

Read on to learn more about how CASRs work and how you can perform one yourself.

Comments closed

Adding Microseconds to ADF Timestamps

Rayis Imayev can’t wait for the next second:

The current addToTime function (https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#addToTime) in Azure Data Factory (ADF) only supports a specific set of time units ranging from Year to Seconds. Since I needed to increment a timestamp by microseconds, I had to find an alternative solution in ADF. Here are my findings on how to do this using an alternate approach.

Click through for Rayis’s solution to the problem.

Comments closed

CETAS to Parquet Files in Azure SQL Managed Instance

Michael Bourgon gives CETAS a chance:

TL;DR – the below lines will allow you to query a table on your MIcreating Parquet files in Azure blob storageAnd you can query it! Next up is partitioning over time, etc, etc. But this is freaking fantastic. I have a python script I wrote that does it, but it’s nowhere as nice/easy as this.

Why do you care? Because it’s a fantastically easy way to archive older data to blob storage, and I suspect (need to test) that if you do it right, you can then have it go to cool/archive storage via a lifecycle setup, so that if you need it much later, you can.

Yep, this is historically one of the best use cases for PolyBase. Unfortunately, we can’t do this in SQL Server 2022, though you can in pre-2022 versions using the Hadoop process. Given that it’s now available in SQL MI, I wouldn’t be too shocked to see it on-premises at some point, with the big question being in SQL Server 2022 or vNext.

2 Comments

Optimizing Kafka Infrastructure Costs

Addison Huddy saves some money:

In this first blog, we’re going to run through the infrastructure costs of running Kafka—i.e., compute, storage, networking, and the additional tooling you need to keep Kafka up and running smoothly. We won’t bury the lede—if you’re running Kafka in the cloud across multiple AZs (as most do for high availability), networking likely represents over 50% of your Kafka infrastructure costs. Let’s see how this ends up being the case.

Click through for some thoughts on how to reduce network costs, using AWS as an example.

Comments closed

Loading WhoIsActive Data on Azure SQL DB

Andrea Allred wants to know who’s doing what on this system:

I needed to collect sp_WhoIsActive into a table, but the twist was that it is on my Azure Managed Database, so I had to get creative with how I did it. We needed an Azure Pipeline to run it, but we wanted to record it every minute and firing a pipeline every minute adds up fast. So we decided that we would kick it off once an hour and have the process wait for a minute and then fire until the hour ended. Then it fire again at the top of the next hour and the same process would happen.

That’s an interesting way to do it. Another alternative might have been an Azure function app, which you could schedule to run every minute. I think that’d be a lot less expensive than running an Azure Pipeline, and this goes to show you that there are many ways to solve the same problem in Azure.

Comments closed

Unrolling Multiple Arrays in Azure Data Factory

Mark Kromer puts us in disarray:

ADF and Synapse data flows gave a Flatten transformation to make it easy to unroll an array as part of your data transformation pipelines. We’ve updated the Flatten transformation to now allow for multiple arrays that can be unrolled in a single transformation step. This will make your ETL jobs much simpler with fewer transformation steps.

Click through for screenshots showing how to use this feature.

Comments closed

Snowflake Data Governance

Enrique Lopez de Lara shares a few ways that Snowflake allows us to protect data in its system:

The role hierarchy in the previous section defines what can be done on different objects and by whom. However, it doesn’t restrict which records within a table a user can see or which values should be masked within a column. That’s where the data governance policies in this section come into play.

All data governance policies and tags are stored in the PROD_DB_GOV database under three schemas: MASKING, ROWACCESS and TAGS. Putting all the policies and tags in a single database allows us to centralize them and better restrict access to them. Please note that only the GOV_ADMIN role has read/write permissions on it.

These are, for the most part, very similar to what we’re used to in relational databases: application and system roles, row-level security, and data classification.

Comments closed