Press "Enter" to skip to content

Category: Cloud

Memory Pressure and Azure SQL Managed Instances

Jovan Popovic takes us through determining whether we have enough memory on an Azure SQL Managed Instance:

Managed Instance has memory that is proportional to the number of cores. As an example, in Gen5 architecture you have 5.1GB of memory per vCore, meaning that 8-core instance will have 41GB memory. Periodically you should check is this amount of memory good for your workload.
 
Do not monitor does the Managed Instance use ~100% of available memory. Some people believe that this is an issue (like hitting 100% CPU), but this is expected and desired behavior. Managed Instance should use as much as possible memory to cache the pages from disk into the buffer pool. The only case where you will not see near 100% usage of memory is the case where you have the databases much smaller that the available memory size so all of them can fit into the memory.

The spoiler version is that it’s the same process as on-prem.

Comments closed

Azure Data Factory: Mapping and Wrangling Data Flows

Cathrine Wilhelmsen explains the difference between Mapping Data Flows and Wrangling Data Flows in Azure Data Factory:

Now, we all know that the consultant answer to “which should I use?” is It Depends ™ 🙂 But what does it depend on?

To me, it boils down to a few key questions you need to ask:
– What is the task or problem you are trying to solve?
– Where and how will you use the output?
– Which tool are you most comfortable using?

Read on to see how they both work.

Comments closed

Auto-Terminating Unused EMR Clusters

Praveen Krishamoorthy Ravikumar shows how you can use AWS Lambda to terminate ElasticMapReduce clusters which have been idle for a certain amount of time:

To avoid this overhead, you must track the idleness of the EMR cluster and terminate it if it is running idle for long hours. There is the Amazon EMR native IsIdle Amazon CloudWatch metric, which determines the idleness of the cluster by checking whether there’s a YARN job running. However, you should consider additional metrics, such as SSH users connected or Presto jobs running, to determine whether the cluster is idle. Also, when you execute any Spark jobs in Apache Zeppelin, the IsIdle metric remains active (1) for long hours, even after the job is finished executing. In such cases, the IsIdle metric is not ideal in deciding the inactivity of a cluster.

In this blog post, we propose a solution to cut down this overhead cost. We implemented a bash script to be installed in the master node of the EMR cluster, and the script is scheduled to run every 5 minutes. The script monitors the clusters and sends a CUSTOM metric EMR-INUSE (0=inactive; 1=active) to CloudWatch every 5 minutes. If CloudWatch receives 0 (inactive) for some predefined set of data points, it triggers an alarm, which in turn executes an AWS Lambda function that terminates the cluster.

We went a slightly different route for auto-termination, killing after a fixed number of hours.

Comments closed

Deploying SQL Server Containers to Azure with Terraform

Andrew Pruski has a post covering deployment of SQL Server containers to Azure using Terraform:

What this is going to do is create an Azure Container Instance Group with one container it in, running SQL Server 2019 CTP 2.5. It’ll be publicly exposed to the internet on port 1433 (I’ll cover fixing that in a future post) so we’ll get a public IP that we can use to connect to.

Notice that the location and resource_group_name are set using variables that retrieve the values of the resource group are going to create.

Cool! We are ready to go!

Fun stuff, and Andrew promises more.

Comments closed

Data Cleansing Options with Azure

James Serra tries to give you an answer of when you should use different Azure services for data cleansing:

Clean the data and optionally aggregate it as it sits in source system.  The tool used for this would depend on the source system that stores the data (i.e. if SQL Server, you would use stored procedures).  The only benefit with this option is if you aggregate the data, you will move less data from the source system to Azure, which can be helpful if you have a small pipe to Azure and don’t need the row-level details.  The disadvantages are: the raw source data is not available in the data lake, so you would always need to go back to source system if you needed to get it again, and it may not even still exist in the source system; you would put extra stress on the source system when doing the cleaning which could affect end users using the system; it could take a long time to clean the data as the source system may not have fast performance; and you would not be able to use other tools (i.e. Hadoop, Databricks) to clean it.  Strongly advise against this option

Read on for additional options and James’s recommendations.

Comments closed

The CosmosDB Emulator

Hasan Savran has a way to let you play with CosmosDB without dropping any cash on it:

CosmosDB Emulator is a must have tool if you develop applications for Azure CosmosDB. Also, it’s a great tool to have if you like to learn about Azure CosmosDB but you have limited access to Azure for any reason. Azure CosmosDB team constantly works to make all available tools better including the emulator. Currently, emulator supports SQL, Cassandra, MongoDB, Gremlin and Table API. Data Explorer feature supports only SQL API for now. Emulator’simplementation is different than the service and you should not use it for stress testing, you can not test global replication or latency for read and writes.

It’s also useful for testing scenarios where you want some level of integration testing but don’t want to rely on an external service.

Comments closed

AzureGraph: Microsoft Graph in R

Hong Ooi takes us through AzureGraph:

Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.

Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.

Just to clarify, this is like Facebook Graph API for Azure components, not a graph database that you can store your own data in.

Comments closed

Extracting the First Element from an Array in ADF

Rayis Imayev shows how you can find the first element in an array using Azure Data Factory:

A user recently asked me a question on my previous blog post (Setting Variables in Azure Data Factory Pipelines) about possibility extracting the first element of a variable if this variable is set of elements (array).

So as a spoiler alert, before writing a blog post and adding a bit more clarity to the existing Microsoft ADF documentation, here is a quick answer to this question.

You’ll have to click through even for the quick answer.

Comments closed

Azure Cloud Shell

Mark Broadbent gives us an introduction to Azure Cloud Shell:

There are two ways to access Azure Cloud Shell, the first being directly through the Azure Portal itself. Once authenticated, look to the top right of the Portal and you should see a grouping of icons and in particular, one that looks very much like a DOS prompt (have no fear, DOS is nowhere to be seen).

The second method to access Azure Cloud Shell is by jumping directly to it via shell.azure.com which will require you to authenticate to your subscription before launching. There is an ever so slight difference between each method. Accessing the Shell via the Azure Portal will not require you to specify your Azure directory context (assuming you have several) since your Portal will have already defaulted to one, whereas with the direct URL method that obviously doesn’t happen.

Read the whole thing.

Comments closed

Azure SQL Linux VM Configuration with dbatools

Rob Sewell walks us through configuring SQL Server on an Azure VM running Linux, installing Powershell, and using dbatools:

I had set the Network security rules to accept connections only from my static IP using variables in the Build Pipeline. I use MobaXterm as my SSH client. Its a free download. I click on sessions

There wasn’t much I could excerpt here, but this is a heavily screenshot-driven tutorial.

Comments closed