Press "Enter" to skip to content

Category: Cloud

Choosing the Right Azure VM Type for SQL Server

Glenn Berry walks us through Azure virtual machine classes and picks out good ones for running SQL Server:

For high performance OLTP SQL Server workloads, the memory optimized type of Azure VMs is usually the best choice. According to Microsoft, “Memory optimized VM sizes offer a high memory-to-CPU ratio that are great for relational database servers.” This gives you lower core counts, with more memory, which is usually what you want for SQL Server, to minimize your license costs and still have good performance.

You can go even further down this path with Constrained vCPU capable VM sizes, where you can constrain the VM vCPU count (to one half or one quarter of the original VM size) to reduce the cost of SQL Server licensing, while maintaining the same memory, storage, and I/O bandwidth as a non-constrained VM. These constrained Azure VMs have a suffix in the name that indicates the number of active vCPUs in the VM.

Click through for the comparison.

Comments closed

More with Azure Data Factory

Cathrine Wilhelmsen continues a series on Azure Data Factory. Since we left off, Cathrine has three new posts. First, a look at how we monitor Azure Data Factory pipelines:

In the previous post, we looked at the three different trigger types, as well as how to trigger pipelines on-demand. In this post, we will look at what happens after that. How does monitoring work in Azure Data Factory?

Now, if we want to look at monitoring, we probably need something to monitor first. I mean, I could show you a blank dashboard, but I kind of already did that, and that wasn’t really interesting at all 🤔 So! In the previous post, I created a schedule trigger that runs hourly, added it to my orchestration pipeline, and published it.

Second, using annotations to customize views:

Annotations are additional, informative tags that you can add to specific factory resources: pipelinesdatasetslinked services, and triggers. By adding annotations, you can easily filter and search for specific factory resources.

You need to figure out what kind of annotations make sense to you. Maybe you want to filter on the different logical steps of your solution, so you add the tags extract and transform? Perhaps ingest and prepare? Or maybe you want to tag resources with a business unit or a project name? It’s entirely up to you. All I recommend is that you’re consistent 🙂

That’s a problem for me—the only thing I’m consistent about is inconsistency. Third, Cathrine introduces the different runtimes available to us:

An integration runtime (IR) specifies the compute infrastructure an activity runs on or gets dispatched from. It has access to resources in either public networks, or in public and private networks.

Or, in Cathrine-speak, using less precise words: An integration runtime specifies what kind of hardware is used to execute activities, where this hardware is physically located, who owns and maintains the hardware, and which data stores and services the hardware can connect to.

There’s a lot of good material in each of these three posts.

Comments closed

Azure Data Factory Triggers

Cathrine Wilhelmsen continues a series on Azure Data Factory by looking at triggers:

One important thing to note is that all times are in UTC. And since UTC does not observe daylight saving time… Well, let’s just say that if you need to execute pipelines during the workday and you have business users waiting for data, you may want to plan some trigger maintenance on the days when you fall back or spring forward. I know. Ugh 🙂 I’m hoping for better timezone support in the future 🤞🏻

Schedule triggers and pipelines have a many-to-many relationship. That means that one schedule trigger can execute many pipelines, and one pipeline can be executed by many schedule triggers.

Time-based triggers aren’t the only options, however—Cathrine also looks at the other three possibilities.

Comments closed

Debugging Azure Data Factory Pipelines

Cathrine Wilhelmsen shows us how to debug Azure Data Factory pipelines:

You debug a pipeline by clicking the debug button:

Tadaaa! Blog post done? 😀

I joke, I joke, I joke. Debugging pipelines is a one-click operation, but there are a few more things to be aware of. In the rest of this post, we will look at what happens when you debug a pipeline, how to see the debugging output, and how to set breakpoints.

Turns out there’s more to it than clicking a button.

Comments closed

Orchestrating ADF Pipelines

Cathrine Wilhelmsen continues a series on Azure Data Factory:

The other way to build this solution is by creating an orchestration pipeline with two execute pipeline activities. This gives us a little more flexibility than having a single pipeline, because we can execute each pipeline separately if we want to.

Let’s start by creating a new pipeline and adding two execute pipeline activities to it. In the activity settings, select the pipelines to execute, and check wait on completion:

Read on for the demonstration.

Comments closed

Creating a Gen-2 Azure Data Lake Store

Cecilia Brusatori shares how to build a generation-2 data lake in Azure:

Finally, you’ve decided that Data Lake Gen 2 is good for your Data Analytics Scenario and you’ve started the journey, went to the Azure Portal and searched for it. Mhh you don’t see it in the options to create it, let’s try the search bar [typing Data Lake Gen2….] Nothing… Ok maybe you’ve missed something…. nope!
So what is in fact a Data Lake Gen 2? it is a blob storage account, optimized for Data Analytics.
Let’s take a look at how you are able to create it!

If you’re used to the first generation, where Azure Data Lake Storage was its own thing, it might take a minute to realize where it went.

Comments closed

Azure Data Factory Data Flows

Cathrine Wilhelmsen continues a series on Azure Data Factory:

So far in this Azure Data Factory series, we have looked at copying data. We have created pipelinescopy data activitiesdatasets, and linked services. In this post, we will peek at the second part of the data integration story: using data flows for transforming data.

But first, I need to make a confession. And it’s slightly embarrassing…

I don’t use data flows enough to keep up with all the changes and new features

To be fair to Cathrine, this is a rapidly-changing part of ADF.

Comments closed

Time Series Anomaly Detection with Power BI

Leila Etaati takes us through time series anomaly detection with Cognitive Services and Power Query:

I am excited about this blog post, this is based on the New service in Cognitive Service name “Anomaly Detection” which is now in Preview.
I recorded a video about how it works in cognitive service https://youtu.be/7ZOtZDbn6gM. 

However, I am going to talk about how to use it in Power BI. In this post first, a brief introduction to the anomaly detection will be presented, then how it can be used inside Power BI will be discussed.

It sounds like there are still some rough edges, but they already have the makings of an interesting service.

Comments closed

Azure Data Factory Continued

Cathrine Wilhelmsen continues a series on Azure Data Factory. Catching up from the last time around, we first see the Copy Data activity:

You can copy data to and from more than 80 Software-as-a-Service (SaaS) applications (such as Dynamics 365 and Salesforce), on-premises data stores (such as SQL Server and Oracle), and cloud data stores (such as Azure SQL Database and Amazon S3). During copying, you can define and map columns implicitly or explicitly, convert file formats, and even zip and unzip files – all in one task.

Yeah. It’s powerful 🙂 But how does it really work?

Then Cathrine hits datasets:

But… please, please, please don’t use “source” or “destination” or “sink” or “input” or “output” or anything like that in your dataset names. It makes sense when you have one pipeline with one copy data activity, but as soon as you start building out your solution, it can get messy. Because what if you realize you want to use the original destination dataset as a source dataset in another copy data activity? Yeah… 🙂

So! Let’s rename the datasets.

After that, it’s on to linked services:

Azure Key Vault is a service for storing and managing secrets (like connection strings, passwords, and keys) in one central location. By storing secrets in Azure Key Vault, you don’t have to expose any connection details inside Azure Data Factory. You can connect to “the application database” without directly seeing the server, database name, or credentials used.

Cathrine is rolling with this series and it’s been great so far.

Comments closed

Build and Deploy SSIS Projects with Azure DevOps

Joost van Rossum has a pair of posts on Azure DevOps updates. First, Azure DevOps supports building SSIS projects:

This new task is much easier to use than the PowerShell code and also easier than most of the third party tasks. With a little practice you can now easily create a build task under two minutes which is probably faster than the build itself.

If your build fails with the following error message then you are probably using a custom task or component (like Blob Storage Download Task). These tasks are not installed on the build agents hosted by Microsoft. The solution is to use a self hosted agent where you can install all custom components

Second, Azure DevOps supports deploying SSIS projects:

Microsoft just released the SSIS Deploy task (public preview) which makes it much easier to deploy an SSIS project. Below you will find the codeless steps to deploy artifacts created by the SSIS Build task.

Click through for the step-by-step instructions for each.

Comments closed