Press "Enter" to skip to content

Category: Cloud

Scaling ADF and Synapse Analytics Pipelines

Paul Andrew has a process for us:

Back in May 2020 I wrote a blog post about ‘When You Should Use Multiple Azure Data Factory’s‘. Following on from this post with a full year+ now passed and having implemented many more data platform solutions for some crazy massive (technical term) enterprise customers I’ve been reflecting on these scenario’s. Specifically considering:

– The use of having multiple regional Data Factory instances and integration runtime services.

– The decoupling of wider orchestration processes from workers.

Furthermore, to supplement this understanding and for added context, in December 2020 I wrote about Data Factory Activity Concurrency Limits – What Happens Next? and Pipelines – Understanding Internal vs External Activities. Both of which now add to a much clearer picture regarding the ability to scale pipelines for the purposes of large-scale extraction and transformation processes.

Read on for details about the scenario, as well as a design pattern to explain the process. This is a large solution for a large-scale problem.

Comments closed

Deploying Custom Docker Images in Azure ML

Tsuyoshi Matsuzaki shows us how to deploy an Azure ML model via custom Docker image:

In my early post, I have showed you how to bring your own custom docker image in training with Azure Machine Learning.
On the contrary, here I’ll show you how to bring custom docker image in model deployment.

In Azure Machine Learning, the base docker image in deployment includes the inferencing assets, such as, Flask server, etc. So you should use AML compliant image for base image, even when you use your own custom docker image.
The list of these maintained AML images is available in https://github.com/Azure/AzureML-Containers .

Read on for an example.

Comments closed

Deploying Datasets in Azure Analysis Services and Power BI PPU

Gilbert Quevauvilliers continues a series on migrating from Azure Analysis Services to Power BI Premium Per User:

Welcome to part 8, where in this blog post, I am going compare deploying datasets.

For those people who are not exactly sure what deployments are, what this means is when you are using Power BI Desktop and you click on Publish, you are effectively deploying your changes to the Power BI Service (Which could also be a server in the cloud).

In this blog post I will show the differences when completing a deployment from AAS and then PPU.

Read on to see several techniques for deploying for each technology.

Comments closed

An Overview of Amazon Athena

Aveek Das takes us through the basics of Amazon Athena:

Serverless. Since Amazon Athena is offered as a fully managed cloud service, customers do not need to take the pain of installing and maintaining separate infrastructures for this. You can start by logging into the AWS Web console and proceeded to Amazon Athena.

Pay Per Query. You only pay for queries you execute. This is very cost-effective, as you can easily figure out your monthly expenses based on your usage pattern. On average, users pay 5 USD for each terabyte of data scanned. This can be further optimized by creating partitions or compressing your dataset.

Interactive Performance. We do not need to worry about the resources that work behind the scenes. When a query is executed, Athena automatically runs the query in parallel across multiple resources, bringing the results faster.

Read on to see an example of Athena in action.

Comments closed

Azure Functions and (Lack of) F# Support

Jamie Dixon has a shaggy dog tale:

When Azure Functions first came out, F# had pretty good support – templates, the ability to run a .fsx file, cool examples written by Don… Fast forward to 2021 and we have bupkis. I recently wrote a dictionary of amino acid weights that would be perfect as a service: pass in a kmer length and weight, get all hits from amino acids.

I first attempted to create a function app, select a .fsx template, and write the code in my browser. Alas, only .csx scripting is available.

Not to be too cutesy about it, but it would be nice if the product which allows for the execution of functions in a cloud service would support the .NET language which most explicitly embraces the notion of functions. If you feel similarly, there is an open feedback ticket.

Comments closed

Getting Started with Citus on Azure

Gauri Mahajan sets up Azure Database for PostgresSQL and picks the really expensive version:

PostgreSQL is an open-source and one of the most popular relational databases that are typically used for OLTP systems. One important feature of this database is that it’s supported by a large community, and with it comes several extensions that can be applied on the PostgreSQL server to use it for a variety of different applications. Examples of such extensions are AppOS, HypoPG, OpenFTS, PostGIS, TimescaleDB (PostgreSQL for time-series), etc.

One such PostgreSQL extension is Citus – which transforms PostgreSQL into a distributed database that enables usage of Postgres in a scale-out or cluster model. With Citus, the PostgreSQL server can be used for high transaction throughputs, processing time-series or IoT data, building analytical warehouses as well as for real-time analytics. Managing such dynamic infrastructure on which PostgreSQL, as well as Citus extension operates, can be quite challenging. Azure recently launched the Citus flavor of PostgreSQL in the form of Azure Database for PostgreSQL – Hyperscale server group. This can be compared to the likes of Azure Synapse or AWS Redshift. In this article, we will learn how to deploy the Hyperscale server group of the Azure Database for PostgreSQL and explore its configuration options.

Read on for setup instructions, as well as some of the benefits you get by using the Citus extension.

Comments closed

Unique Resource Names and Azure

Meagan Longoria gives us a warning:

Each resource type in Azure has a naming scope within which the resource name must be unique. For PaaS resources such as Azure SQL Server (server for Azure SQL DB) and Azure Data Factory, the name must be globally unique within the resource type. This means that you can’t have two data factories with the same name, but you can have a data factory and a SQL server with the same name. Virtual machine names must be unique within the resource group. Azure Storage accounts must be globally unique. Azure SQL Databases should be unique within the server.

Since Azure allows you to create a data factory and a SQL server with the same resource name, you may think this is fine. But you may want to avoid this, especially if you plan on using system-defined managed identities or using Azure PowerShell/CLI. And if you aren’t planning on using these things, you might want to reconsider.

Click through for a demonstration of how you might get into trouble with this.

Comments closed

Azure Test Plan Terminology

Kevin Chant is here with a language lesson:

In this post I want to cover some Azure Test Plans jargon for Data Platform professionals. Because I understand it can be confusing.

In addition, I did say I would explain some jargon in my last post about using Azure Test Plans for Data Platform deployments. Of course, these explanations will help with other kinds of deployments as well as Data Platform ones.

By the end of this post, you will have a better understanding of some of the jargon involved in Azure Test Plans. Plus, a good recommendation of a lab to use.

Click through for that depiction.

Comments closed

Connecting to Azure Blob Storage from Power BI

Kristyna Hughes links Power BI to a data source:

The step-by-step process below walks through connecting to data housed in Azure Blob Storage from Power BI using a SAS token. There are many ways to grab your data from Blob Storage, but this is the most efficient, scalable, and secure way that I found (with some security restrictions from watchful DBAs).

Click through for the solution, which is based on using SAS tokens.

Comments closed

Deploying Azure Policies via Terraform

Jonathan D’Aloia shows how you can use Terraform to work with Azure Policies:

As you may all know Terraform serves a great purpose in deploying resources and infrastructure into your Azure environment, however, Terraform can also be used to automate and consistently deploy Azure Policies which can be defined prior to any resources being generated. In this blog, I will cover how you can import policies into your Terraform State to then deploy into an Azure Resource Group in order to secure your landing zone prior to deploying any resources.

Click through for an example.

Comments closed