Cloud – Page 65 – Curated SQL

Azure Databricks Security Considerations

Published 2022-04-26 by Kevin Feasel

Craig Porteous provides some advice on configuring Azure Databricks:

Azure Databricks is an analytics platform and often serves as the central compute component of a data platform, to process ETL/ELT data pipelines and data science workloads. As Databricks is a third-party platform-as-a-service offering securing it works differently to most other first-party services in Azure; for example, we can’t use private endpoints. (More on these in the Azure Storage post)
The two main approaches to working with Databricks in our secure platform are VNet Peering or VNet Injection

Click through to learn the difference between these two, as well as a few other factors to keep in mind as you’re deploying Databricks.

Comments closed

Logic Apps: Source Control and Deployment

Published 2022-04-22 by Kevin Feasel

Koen Verbeeck has a two-parter. First up is storing Logic App code in source control:

At a data warehouse project I’m using a couple of Logic Apps to do some lightweight data movements. For example: reading a SharePoint list and dumping the contents into a SQL Server table. Or reading CSV files from a OneDrive directory and putting them in Blob storage. Some of those things can be done in Azure Data Factory as well, but it’s easier and cheaper to do them with Logic apps.
Logic Apps are essentially JSON code behind the scenes, so they should be included into the source control system of your choice (for the remainder of the blog post we’re going to assume this is git).

The second post covers deployment:

It’s easy to duplicate an Azure Logic App in a resource group, but unfortunately you cannot duplicate a Logic App between environments (you might try to copy paste the JSON though). So unless you want to hand craft every Logic App yourself on each of your environments, you need a way to automatically deploy your Logic Apps. It’s easier, faster and less error-prone than any manual method.

Check out both posts.

Comments closed

Imagining a SaaS Plane for Data Mesh in Azure

Published 2022-04-20 by Kevin Feasel

Paul Andrew shares some deep thoughts:

For part 7 of this series, I want to explore what else could be delivered in our Azure Data Mesh if we continue our established thinking around the planes of interaction for our data products. As with part 6, we are still missing good Azure Resources that can deployed for certain situations. However, I want to frontload some concepts now, so we are ready if/when a suitable technical answer arrives in the cloud.

Note that this is all speculative. It’s interesting speculation, though.

Comments closed

Cross-Subscription Restore for Dedicated SQL Pools

Published 2022-04-20 by Kevin Feasel

Steve Howard announces some good news:

We are excited to announce the release of cross-subscription restore. This has been one of our top requested features from customers as it unlocks multiple scenarios from dev/test to simplified billing at the subscription level for restored data warehouses.

Click through to see how you can do this. There was a workaround in the past but this should be quite a bit faster.

Comments closed

Azure Redis Tips

Published 2022-04-19 by Kevin Feasel

Arun Sirpal enumerates some advice:

My learnings on Redis thus far which you may find useful:
1. Location of Redis should be close to your app.
2. Data structures within Redis, larger key value sizes lead to fragmentation of memory space and these larger memory requirements means more network data transfer, Redis states to use 100KB maximum, this will affect the transfer time allocated from the app. It could time out if the data request is big.

Click through for the rest of Arun’s advice. My advice on the 100KB maximum is that it really should be closer to 100 bytes or 1KB max in practice, especially for storing data which differs by entity (user, customer, organization, whatever your domain uses).

Comments closed

Using the master dacpac in Azure DevOps

Published 2022-04-18 by Kevin Feasel

Koen Verbeeck makes use of system databases in a database project:

I have a database project in Visual Studio. Inside the database, I use a couple of system views to fetch some metadata about tables. To make the project build successfully, you need to add a reference to the master database in the project.

That all works fine but there’s a bit more you need to do before Azure DevOps can work with the file. Read on to learn what that thing is.

Comments closed

Streaming Data into Synapse Dedicated SQL Pool

Published 2022-04-15 by Kevin Feasel

Lionel Penuchot loads some data:

This article reviews a common pattern of streaming data (i.e. real-time message ingestion) in Synapse dedicated pool. It opens a discussion on the simple standard way to implement this, as well as the challenges and drawbacks. It then presents an alternate solution which enables optimal performance and greatly reduces maintenance tasks when using clustered column store indexes. This is aimed at developers, DBAs, architects, and anyone who works with streams of data that are captured in real-time.

I’d probably avoid the MERGE statement in there because of how many problems there are with it. That said, this is a useful pattern for trickle-loading columnstore tables.

Comments closed

Costs for Managed Virtual Networks in Azure Data Factory

Published 2022-04-13 by Kevin Feasel

Martin Schoombee brings up an interesting point:

We were running SSIS in an Azure VM, spinning the VM up and down as required to run the ETL processes. A third-party SSIS component was used to extract data out of Dynamics 365 CRM, and accounted for a significant part of the yearly costs. I blogged about the reasons why I think it’s worth moving from Azure AS to Power BI PPU before, and combined with the move to Azure Data Factory I estimated a cost reduction of almost 35%.
After deploying the solution I noticed that our daily ETL costs were significantly higher than I thought it would be, and that started a little rabbit-hole exercise to figure out why.

I’m used to thinking about managed virtual networks in the case of Azure Synapse Analytics, where I think it makes a lot of sense as a default (especially because you can’t switch after you’ve made a decision).

Comments closed

From Confluent Cloud into Azure Synapse Analytics

Published 2022-04-12 by Kevin Feasel

Jacob Bogie and Dustin Vannoy show how to integrate Kafka in Confluent Cloud with pools in Azure Synapse Analytics:

Just released this fall, is the fully managed Synapse Connector. Azure Synapse Analytics provides a platform for data analysts and data scientists to analyze and combine data from multiple sources. Within Confluent Cloud, data can be synched to dedicated SQL pools via the fully managed Synapse sink connector and attached to Synapse Analytics workspace. Once added to the Synapse Analytics workspace, analysts have the ability to perform advanced analytics and reporting on data in the Confluent pipeline. The ability to access event-level data enables event-level analytics and data exploration.

Click through for two examples, one of loading data into a dedicated SQL pool and one of streaming data into Spark Streaming running on (naturally) a Spark pool.

Comments closed

Role-Based Access Controls in Redshift

Published 2022-04-12 by Kevin Feasel

Milind Oke, et al, describe RBAC in Amazon Redshift:

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the challenges with security is that enterprises don’t want to have a concentration of superuser privileges amongst a handful of users. Instead, enterprises want to design their overarching security posture based on the specific duties performed via roles and assign these elevated privilege roles to different users. By assigning different privileges to different roles and assigning these roles to different users, enterprises can have more granular control of elevated user access.
In this post, we explore the role-based access control (RBAC) features of Amazon Redshift and how you can use roles to simplify managing privileges required to your end-users. We also cover new system views and functions introduced alongside RBAC.

Read on to learn about system-defined roles as well as creating user-customizable roles.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Cloud