Press "Enter" to skip to content

Category: Cloud

Cross-Subscription Restore for Dedicated SQL Pools

Steve Howard announces some good news:

We are excited to announce the release of cross-subscription restore. This has been one of our top requested features from customers as it unlocks multiple scenarios from dev/test to simplified billing at the subscription level for restored data warehouses.

Click through to see how you can do this. There was a workaround in the past but this should be quite a bit faster.

Comments closed

Azure Redis Tips

Arun Sirpal enumerates some advice:

My learnings on Redis thus far which you may find useful:

1. Location of Redis should be close to your app.

2. Data structures within Redis, larger key value sizes lead to fragmentation of memory space and these larger memory requirements means more network data transfer, Redis states to use 100KB maximum, this will affect the transfer time allocated from the app. It could time out if the data request is big.

Click through for the rest of Arun’s advice. My advice on the 100KB maximum is that it really should be closer to 100 bytes or 1KB max in practice, especially for storing data which differs by entity (user, customer, organization, whatever your domain uses).

Comments closed

Using the master dacpac in Azure DevOps

Koen Verbeeck makes use of system databases in a database project:

I have a database project in Visual Studio. Inside the database, I use a couple of system views to fetch some metadata about tables. To make the project build successfully, you need to add a reference to the master database in the project.

That all works fine but there’s a bit more you need to do before Azure DevOps can work with the file. Read on to learn what that thing is.

Comments closed

Streaming Data into Synapse Dedicated SQL Pool

Lionel Penuchot loads some data:

This article reviews a common pattern of streaming data (i.e. real-time message ingestion) in Synapse dedicated pool. It opens a discussion on the simple standard way to implement this, as well as the challenges and drawbacks. It then presents an alternate solution which enables optimal performance and greatly reduces maintenance tasks when using clustered column store indexes. This is aimed at developers, DBAs, architects, and anyone who works with streams of data that are captured in real-time.

I’d probably avoid the MERGE statement in there because of how many problems there are with it. That said, this is a useful pattern for trickle-loading columnstore tables.

Comments closed

Costs for Managed Virtual Networks in Azure Data Factory

Martin Schoombee brings up an interesting point:

We were running SSIS in an Azure VM, spinning the VM up and down as required to run the ETL processes. A third-party SSIS component was used to extract data out of Dynamics 365 CRM, and accounted for a significant part of the yearly costs. I blogged about the reasons why I think it’s worth moving from Azure AS to Power BI PPU before, and combined with the move to Azure Data Factory I estimated a cost reduction of almost 35%.

After deploying the solution I noticed that our daily ETL costs were significantly higher than I thought it would be, and that started a little rabbit-hole exercise to figure out why.

I’m used to thinking about managed virtual networks in the case of Azure Synapse Analytics, where I think it makes a lot of sense as a default (especially because you can’t switch after you’ve made a decision).

Comments closed

From Confluent Cloud into Azure Synapse Analytics

Jacob Bogie and Dustin Vannoy show how to integrate Kafka in Confluent Cloud with pools in Azure Synapse Analytics:

Just released this fall, is the fully managed Synapse Connector. Azure Synapse Analytics provides a platform for data analysts and data scientists to analyze and combine data from multiple sources. Within Confluent Cloud, data can be synched to dedicated SQL pools via the fully managed Synapse sink connector and attached to Synapse Analytics workspace. Once added to the Synapse Analytics workspace, analysts have the ability to perform advanced analytics and reporting on data in the Confluent pipeline. The ability to access event-level data enables event-level analytics and data exploration.

Click through for two examples, one of loading data into a dedicated SQL pool and one of streaming data into Spark Streaming running on (naturally) a Spark pool.

Comments closed

Role-Based Access Controls in Redshift

Milind Oke, et al, describe RBAC in Amazon Redshift:

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the challenges with security is that enterprises don’t want to have a concentration of superuser privileges amongst a handful of users. Instead, enterprises want to design their overarching security posture based on the specific duties performed via roles and assign these elevated privilege roles to different users. By assigning different privileges to different roles and assigning these roles to different users, enterprises can have more granular control of elevated user access.

In this post, we explore the role-based access control (RBAC) features of Amazon Redshift and how you can use roles to simplify managing privileges required to your end-users. We also cover new system views and functions introduced alongside RBAC.

Read on to learn about system-defined roles as well as creating user-customizable roles.

Comments closed

Thinking Azure Data Platform Security Architecture

Craig Porteous begins a new series:

Reference architectures are great! You’ve got all of the key components in there, nice and clear. Colourful lines showing how data moves through each stage, product, or service. Great for a slide deck or a proposal to get rid of that old creaking data warehouse and into a shiny new Data Lakehouse.

Not so great for the finer details demanded by security operations teams however.

This promises to be an interesting series.

Comments closed

Restarting Azure Data Factory Triggers

Andy Leonard provides an after-action report:

During delivery of the class, I popped over to a much older data factory and fired up a couple integration runtimes (IRs). You see, on this older data factory, I trigger a couple pipelines that check to see if I’ve left an IR running. If so, each pipeline will shut down its respective IR. The trigger fires each evening. I blogged about the pipeline design almost two years ago in a post titled  Stop an Azure-SSIS Files Integration Runtime (Safely).

Read on for the full report, some takeaways on how to limit the risk, and possible next steps if you find yourself in a situation like Andy did.

Comments closed

Connecting Kafka Cross-Network

Praful Khandelwal sets up a hybrid Kafka cluster:

In this article, we will be talking about a simple set-up involving local machine (macOS) and Azure VM. We’ll discuss the step-by-step procedure to produce events from local machine to Kafka broker hosted on Azure VM and also to consume those events back in local machine. While this does not cover the exact scenario described above, it gives a fair idea about how the Kafka messages can be exchanged across the network.

Kafka is pretty chatty, so I’d hope to have really good network connectivity, such as a Direct Connect (for AWS) or Express Route (Azure) in place.

Comments closed