Press "Enter" to skip to content

Category: Cloud

Tracking Azure Resources with Tags

Jess Pomfret explains the value of tags:

One of the vital parts of this learning and experimenting needs to be cleaning up after myself.  We all know the risks of leaving things running in Azure- it’s likely to drain your training budget pretty quickly.  To be fair, this is also a good lesson for real world scenarios. Getting used to turning off or scaling down resources based on need is a good way to reduce your Azure spend.

This brings me to one morning last week. I logged in to the portal and got a pop up that my credit was down to under $5, which is not what I was expecting. I started looking around and wondering what I’d left running – it isn’t always easy to spot though.

Read on to see how tags can help with this, as well as other forms of cloud governance. If you remember to set them, that is.

Comments closed

Early Thoughts on Dremio

Meagan Longoria gives us a review of Dremio:

I’ve been working on a project for the last few months with a client who has chosen to implement Dremio in Azure. Dremio is a data lake engine that creates a semantic layer and supports interactive queries.

It uses Apache Arrow, Gandiva, and Parquet files under the hood. It runs on either Linux VMs or Kubernetes containers. Like most big data systems, there is at least one coordinator node and one or more executor nodes. These nodes communicate and are managed using Apache Zookeeper. Client applications connect to Dremio via ODBC, JDBC, REST APIs, or Arrow Flight. Dremio can read from storage accounts, external databases, and a few other sources.

Read on for good and bad aspects of the product.

Comments closed

Creating Parquet Files from SQL Server Data

Andy Leonard answers a challenge:

I searched and found some promising Parquet SSIS components available from CData Software and passed that information along. I shared my inexperience in exporting to parquet format and asked a few friends how they’d done it.

I thought: How many times have I demonstrated Azure Data Factory and clicked right past file format selection without giving Parquet a second thought? Too many times. It was time to change that.

Another route is to use PolyBase. If you’re okay with writing the results to Azure Blob Storage, you can insert directly into Parquet files the results of a SQL query. If that sounds interesting, here are posts on connecting to Azure Blob Storage via PolyBase and inserting into Azure Blob Storage. I insert in CSV format to make it easier for people to follow, but swap the file format with Parquet and it works all the same.

Comments closed

Deploying Azure Data Services via Terraform

Chris Adkin has started a series on deploying Azure Arc enabled Data Services. Part 1 serves as an introduction

:One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc has to offer, Microsoft has come up with Jumpstart – a collection of GitHub repos for deploying Arc in different scenarios. Last Christmas I had a few vacation days and took the opportunity to try out Jumpstart for Azure Arc enabled data services on AWS. AWS was my choice because it made a certain amount of sense to try out Azure Managed SQL Server instances and Postgres Hyperscale on a cloud that they are not natively available on. After all, the whole point of Azure Arc enabled Data Services is to bring Azure to you on your terms if for any reason you cannot use the Azure cloud. 

Part 2 gives us an introduction to Terraform:

Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool that works on the principle of infrastructure-as-code. Resources are specified in what are called configuration files using Hashicorp Control Language in a declarative manner, i.e. you state what you want and to the best of its ability Terraform attempts to create those resources for you. ‘Providers’ are used to create resources for particular types of entity, for example you might use local file, helm (the Kubernetes package manager), Azure, VMware providers etc. etc. . . . Using providers requires plugins, most of which are provided by Hashicorp, but third parties can write their own plugins also.

Check out the first two posts in what promises to be an interesting series.

2 Comments

Bad Request when Debugging an Azure Data Factory Pipeline

Ed Elliott ran into a problem:

Now, whenever I am troublehooting something in Azure and I come to the activity logs I am always hopeful but also always dissapointed that they don’t show more details. The bit that really annoys me is that I know Micrsoft see more detailed error information as I have been screen sharing with a support tech who used log exporer to see more detailed error messages than I see – grrrr, just show us the data! Anyway, I digress – so in the activity log, does it give a clue as to what is wrong?

No, in a word no it doesn’t. 

Read on for the conclusion, which rates as “Should have been an easy fix but the error message was completely unhelpful.”

Comments closed

Delivering Data Insights using the Microsoft Data Platform

Paul Andrew has a talk:

Let’s start with a story, not a ‘once upon a time story‘, a story for your backlog 

As a solution architect
I need to design and build an Azure data analytics platform end to end
to deliver data insights for my customer.

In February 2021 I delivered a talk as part of the Scottish Summit conference on how you could/should build an end to end data platform solution in Azure to deliver data insights and analytics. This is one of my favourite sessions so thought it worth re-sharing the recording here.

Click through for the abstract as well as the video.

Comments closed

Ignite 2021 Data and AI Announcements

James Serra has a roundup of the Data & AI announcements at Microsoft Ignite 2021:

Azure Arc enabled machine learning (preview): Build models on-premises, in multi-cloud, and at the edge with Azure Arc. It does this by deploying Azure Machine Learning to a Kubernetes cluster that resides on-prem or in another cloud. More info

Azure Migrate new features: Discover and assess your SQL servers and their databases for migration to Azure from within the Azure Migrate portal, get target SKU recommendations, and estimate monthly costs. More info

Azure SQL enhancementsMaintenance window – Azure SQL Database and Azure Managed Instance fixed maintenance windows (preview) – More infoAdvance notifications – Enable you to prepare for planned maintenance events on your Azure SQL Database resources and minimize the impact of database failover on your sensitive workloads – More info

Click through for details on each of the announcements.

Comments closed

Analyzing XGBoost Training Reports

Simon Zamarin, et al, walk us through using XGBoost reports in Amazon’s Sagemaker Debugger:

In 2019, AWS unveiled Amazon SageMaker Debugger, a SageMaker capability that enables you to automatically detect a variety of issues that may arise while a model is being trained. SageMaker Debugger captures model state data at specified intervals during a training job. With this data, SageMaker Debugger can detect training issues or anomalies by leveraging built-in or user-defined rules. In addition to detecting issues during the training job, you can analyze the captured state data afterwards to evaluate model performance and identify areas for improvement. This task is made easier with the newly launched XGBoost training report feature. With a minimal amount of code changes, SageMaker Debugger generates a comprehensive report outlining key information that you can use to evaluate and improve the model.

This post shows you an end-to-end example of training an XGBoost model on Sagemaker and how to enable the automatic XGBoost report functionality in Sagemaker Debugger to quickly and easily evaluate model performance and identify areas of improvement for your model. Even if you don’t have a lot of data science experience, you can still gauge how well the model performs and identify areas of improvement based on information provided by the report. The code from this post is available in the GitHub repo.

Click through for an example of this in action.

Comments closed

Using Terraform to Tag Created Date

John Martin has an interesting use case for tagging in Terraform:

One of the key properties missing from Azure resources, in my opinion anyway, is a CreatedDate. This can be largely overcomes with Azure policy, but what if you don’t have access to create one that applies a timestamp tag at resource creation?

It is possible to use Terraform to tag the resource and set the value for when the resource is created. There is a little more work that needs to go into it to ensure that once it is set that Terraform does not overwrite it on subsequent deployments. But, it is achievable and brings this into your control if needed.

Click through to see how.

Comments closed

A Warning on Using Distributed Network Names

Allan Hirt has a warning for us:

DNNs are supported as of SQL Server 2019 CU2 and require Windows Server 2016 or later. I wrote more about them in my blog post Configure a WSFC in Azure with Windows Server 2019 for AGs and FCIs. Go there if you want to see what they look like and learn more.

Right now, I cannot wholeheartedly recommend the use of DNNs for listeners or FCIs if you are using Enterprise Edition. Why?

Read on to learn why.

Comments closed