Press "Enter" to skip to content

Category: Cloud

Costs for Managed Virtual Networks in Azure Data Factory

Martin Schoombee brings up an interesting point:

We were running SSIS in an Azure VM, spinning the VM up and down as required to run the ETL processes. A third-party SSIS component was used to extract data out of Dynamics 365 CRM, and accounted for a significant part of the yearly costs. I blogged about the reasons why I think it’s worth moving from Azure AS to Power BI PPU before, and combined with the move to Azure Data Factory I estimated a cost reduction of almost 35%.

After deploying the solution I noticed that our daily ETL costs were significantly higher than I thought it would be, and that started a little rabbit-hole exercise to figure out why.

I’m used to thinking about managed virtual networks in the case of Azure Synapse Analytics, where I think it makes a lot of sense as a default (especially because you can’t switch after you’ve made a decision).

Comments closed

From Confluent Cloud into Azure Synapse Analytics

Jacob Bogie and Dustin Vannoy show how to integrate Kafka in Confluent Cloud with pools in Azure Synapse Analytics:

Just released this fall, is the fully managed Synapse Connector. Azure Synapse Analytics provides a platform for data analysts and data scientists to analyze and combine data from multiple sources. Within Confluent Cloud, data can be synched to dedicated SQL pools via the fully managed Synapse sink connector and attached to Synapse Analytics workspace. Once added to the Synapse Analytics workspace, analysts have the ability to perform advanced analytics and reporting on data in the Confluent pipeline. The ability to access event-level data enables event-level analytics and data exploration.

Click through for two examples, one of loading data into a dedicated SQL pool and one of streaming data into Spark Streaming running on (naturally) a Spark pool.

Comments closed

Role-Based Access Controls in Redshift

Milind Oke, et al, describe RBAC in Amazon Redshift:

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the challenges with security is that enterprises don’t want to have a concentration of superuser privileges amongst a handful of users. Instead, enterprises want to design their overarching security posture based on the specific duties performed via roles and assign these elevated privilege roles to different users. By assigning different privileges to different roles and assigning these roles to different users, enterprises can have more granular control of elevated user access.

In this post, we explore the role-based access control (RBAC) features of Amazon Redshift and how you can use roles to simplify managing privileges required to your end-users. We also cover new system views and functions introduced alongside RBAC.

Read on to learn about system-defined roles as well as creating user-customizable roles.

Comments closed

Thinking Azure Data Platform Security Architecture

Craig Porteous begins a new series:

Reference architectures are great! You’ve got all of the key components in there, nice and clear. Colourful lines showing how data moves through each stage, product, or service. Great for a slide deck or a proposal to get rid of that old creaking data warehouse and into a shiny new Data Lakehouse.

Not so great for the finer details demanded by security operations teams however.

This promises to be an interesting series.

Comments closed

Restarting Azure Data Factory Triggers

Andy Leonard provides an after-action report:

During delivery of the class, I popped over to a much older data factory and fired up a couple integration runtimes (IRs). You see, on this older data factory, I trigger a couple pipelines that check to see if I’ve left an IR running. If so, each pipeline will shut down its respective IR. The trigger fires each evening. I blogged about the pipeline design almost two years ago in a post titled  Stop an Azure-SSIS Files Integration Runtime (Safely).

Read on for the full report, some takeaways on how to limit the risk, and possible next steps if you find yourself in a situation like Andy did.

Comments closed

Connecting Kafka Cross-Network

Praful Khandelwal sets up a hybrid Kafka cluster:

In this article, we will be talking about a simple set-up involving local machine (macOS) and Azure VM. We’ll discuss the step-by-step procedure to produce events from local machine to Kafka broker hosted on Azure VM and also to consume those events back in local machine. While this does not cover the exact scenario described above, it gives a fair idea about how the Kafka messages can be exchanged across the network.

Kafka is pretty chatty, so I’d hope to have really good network connectivity, such as a Direct Connect (for AWS) or Express Route (Azure) in place.

Comments closed

The Basics of Azure Storage Explorer

Manvendra Singh takes us through Azure Storage Explorer:

This article will explain Azure storage explorer, its installations, and details of how to start working with this application to access Azure storage services. Azure storage provides a flexible solution to store various types of data at a massive scale in the cloud environment. If you have many storage accounts in Azure storage, then it will be difficult to manage them. Microsoft has recognized this problem and developed a desktop application Azure storage explorer to manage Azure storage accounts easily. It can be installed on Windows, Linux, and macOS operating systems.

This is a rather useful tool.

Comments closed

Performance Optimization for Azure Data Explorer

Ashok Anand Kumar has some performance tips:

Azure Data Explorer provides the capability to easily fetch telemetry data from a variety of data sources and run complex analytical queries. Azure Data Explorer supports both batch and streaming ingestion to support near real-time latency requirements. Batch ingestion will have latencies based on the batching policy and query frequency from applications. Streaming ingestion can be leveraged for low latency requirements. Data is cached and indexed for faster query performance and optionally exported out to Azure Data Lake in parquet format for batch processing and integration with other Big Data and Machine Learning (ML) services. 

Read on for several tips.

Comments closed

Azure Delete Locks

Denny Cherry has some advice:

When I’m working in a client’s Azure environment, and they don’t have a delete lock on their production environment I always work on getting them to have one.

This doesn’t always play nicely with everything in Azure, so read on for Denny’s advice when working with Azure Migrate.

Comments closed

An Overview of Azure IoT Central

James Serra looks at IoT Central:

This is a short blog to give you a high-level overview on a product called Azure IoT Central. I saw this fairly new Azure product (GA Sept 2018) in use for the first time at a large manufacturing company who was using it at their manufacturing facility (see Grupo Bimbo takes a bite out of production costs with Azure IoT throughout factories). They have thousands of sensors that are collecting data for all the machines used in producing their products. In short, think of it as an “Application Platform as a Service (aPaas)” for quickly building IoT solutions. It’s boxing up IoT hub, Device Provisioning Service (DPS), Stream Analytics, Data Explorer, SQL Database, Time Series Intelligence and Cosmos DB to make it easy to quickly build a solution and get value out of the IoT data. To get an idea of the what this solution would look like, check out the IoT Central sample for calculating Overall Equipment Effectiveness (OEE) of industrial equipment.

I haven’t seen much use of this service, as generally any use case I’ve seen around IoT quickly turns into using IoT Hub and IoT Edge to develop custom code.

Comments closed