Press "Enter" to skip to content

Category: Cloud

Alerting In Azure SQL Database

Arun Sirpal shows how to set up an alert for an Azure SQL Database:

I keep things simple and like to look at certain performance based metrics but before talking about what metrics are available let’s step through an example.

For this post I want to setup an alert for CPU percentage utilised that when it is greater than 50% over the last 5 minutes I would like to know about it. First step is to navigate to your Azure SQL Database.

Click through for a screenshot-driven guided tour.

Comments closed

Connect(); Announcements, Including Azure Databricks

James Serra has a wrapup of Microsoft Connect(); announcements around the data platform space:

Microsoft Connect(); is a developer event from Nov 15-17, where plenty of announcements are made.  Here is a summary of the data platform related announcements:

  • Azure Databricks: In preview, this is a fast, easy, and collaborative Apache Spark based analytics platform optimized for Azure. It delivers one-click set up, streamlined workflows, and an interactive workspace all integrated with Azure SQL Data Warehouse, Azure Storage, Azure Cosmos DB, Azure Active Directory, and Power BI.  More info

  • Azure Cosmos DB with Apache Cassandra API: In preview, this enables Cassandra developers to simply use the Cassandra API in Azure Cosmos DB and enjoy the benefits of Azure Cosmos DB with the familiarity of the Cassandra SDKs and tools, with no code changes to their application.  More info.  See all Cosmos DB announcements

  • Microsoft joins the MariaDB Foundation: Microsoft is a platinum sponsor – MariaDB is a community of the MySQL relational database management system and Microsoft will be actively contributing to MariaDB and the MariaDB community.  More info

Click through for more.  And if you want more info on Azure Databricks, Matei Zaharia and Peter Carlin have more information:

So how is Azure Databricks put together? At a high level, the service launches and manages worker nodes in each Azure customer’s subscription, letting customers leverage existing management tools within their account.

Specifically, when a customer launches a cluster via Databricks, a “Databricks appliance” is deployed as an Azure resource in the customer’s subscription.   The customer specifies the types of VMs to use and how many, but Databricks manages all other aspects. In addition to this appliance, a managed resource group is deployed into the customer’s subscription that we populate with a VNet, a security group, and a storage account. These are concepts Azure users are familiar with. Once these services are ready, users can manage the Databricks cluster through the Azure Databricks UI or through features such as autoscaling. All metadata (such as scheduled jobs) is stored in an Azure Database with geo-replication for fault tolerance.

I’ve been a huge fan of the Databricks Community Edition.  We’ll see if there will be a Community Edition version for Azure as well.

Comments closed

Dynamic Pivoting In Redshift

Maria Zakourdaev is not Redshift’s biggest fan:

Several days ago I have spent a few hours of my life figuring out how to do dynamic pivot in Amazon Redshift. To tell you the truth, I have expected much more from this DBMS SQL language.

Redshift is based on Postgre SQL 8.0.2 ( which was released in 2005 !!!! )

Anything you would want for this, not too difficult task,  does not exits.  No stored procedures. No JSON datatype. No variables outside of UDF, no queries inside UDFs. “UDF can be used to calculate values but cannot be used to call SQL functions”. Python UDFs also cannot query the data, only perform calculations.

Finally I have found one useful function LISTAGG that helped me to get distinct values of all pivoted columns.

Read on to see how Maria solved this problem.  And to tell the truth, I’m not Redshift’s biggest fan either.

Comments closed

Testing Event Hub To Stream Analytics Performance

Rolf Tesmer tries a few different settings for optimizing performance when streaming data from Azure Event Hub to Azure Stream Analytics:

When you configure Azure Stream Analytics you only have 2 levers;

  • Streaming Units (SU) – Each SU is a blend of compute, memory and throughput between 1 and 48 (or more by contacting support).  The factors that impact SU are query complexity, latency, and volume of data. SU can be used to scale out a job to achieve higher throughput. Depending on query complexity and throughput required, more SU units may be necessary to achieve your performance requirements.  A level of SU6 assigns an entire Stream Analytics node.   For our test we wont change SU

  • SQL Query Design – Queries are expressed in a SQL-like query language. These queries are documented in the query language reference guide and includes several common query patterns.  The design of the query can greatly affect the job throughput, in particular if and/or how the PARTITION BY clause is used.

Rolf tests along three margins:  2 versus 16 input partitions, 2 versus 16 output partitions, and whether to partition the data or not.  Read on to see which combination was fastest.

Comments closed

How Per-Second AWS Billing Helps With Data Processing

Prakash Chockalingam explains how AWS per-second billing can make resource allocation easier:

Because of the hourly increments in billing, users spend a lot of time playing a giant game of Tetris with their big data workloads — figuring out how to pack jobs to use every minute of the compute hour. Examples:

  • If a job could be run on 10 nodes and finished in 20 minutes, it was better to run it on fewer nodes so that it takes around 50 minutes. As a result, you would pay less.
  • Running two 10 node jobs that took 20 minutes in parallel would cost two times more than running them sequentially.

The above problem got compounded if there were many such jobs to be run. To handle this challenge, many organizations turned to a resource scheduler like YARN. Organizations were following the traditional on-premises model of setting up one or more big multi-tenant cluster on the cloud and running YARN to bin-pack the different jobs.

Read on to see how this has changed as a result of per-second billing.

Comments closed

Picking Azure VM Sizes

Glenn Berry helps us pick the right-sized Azure VM for a SQL Server installation:

A common issue with Azure VM sizing for SQL Server has been the fact that you were often forced to select a VM size that had far more virtual CPU cores than you needed or wanted in order to have enough memory and storage performance to support your workload, which increased your monthly licensing cost.

Luckily, Microsoft has recently made the decision process a little easier for SQL Server with a new series of Azure VMs that use some particular VM sizes (DS, ES, GS, and MS), but reduce the vCPU count to one quarter or one half of the original VM size, while maintaining the same memory, storage and I/O bandwidth. These these new VM sizes have a suffix that specifies the number of active vCPUs to make them easier to identify.

For example, a Standard_DS14v2 Azure VM would have 16 vCPUs, 112GB of RAM, and support up to 51,200 IOPS or 768MB/sec of sequential throughput (according to Microsoft). A new Standard_DS14-8v2 Azure VM would only have 8 vCPUs, with the same memory capacity and disk performance as the Standard_DS14v2, which would reduce your SQL Server licensing cost per year by 50%. Both of these Azure VM SKUs would have the same ACU score of 160.

Glenn is, as always, a font of useful information.  Go read the whole thing.

Comments closed

Cosmos DB Limitations

Vincent-Philippe Lauzon points out a few limitations with Cosmos DB:

The original DocumentDB SQL didn’t have any aggregation capacity.  But it did acquire those capacities along the way.

Traditionally, that isn’t the strong spot for document-oriented databases.  They tend to be more about find documents and manipulating the documents as oppose to aggregating metrics on a mass of documents.

Today, DocumentDB SQL implements the following aggregate functions:

  • COUNT

  • SUM

  • MIN

  • MAX

  • AVG

Read on for where the current aggregation limitation is, as well as more.

Comments closed

Is Azure SQL DW A Good Fit For You?

Melissa Coates has a nice choose-your-own-adventure story around Azure SQL Data Warehouse:

Q4: How large is your database?

It is difficult to pinpoint an exact number for the absolute minimum size recommended for Azure SQL DW. Many data professionals in the industry see the minimum “practical” data size for Azure SQL DW in the 1-4TB range. Microsoft documentation has recently stated as low as 250GB for a minimum size. Since Azure SQL DW is an MPP (massively parallel processing) system, you experience a significant performance penalty with small data sizes because of the overhead incurred to distribute and consolidate across the nodes (which are distributions in a “shared-nothing” architecture). We recommend Azure SQL DW for a data warehouse which is starting to approach 1TB and expected to continue growing.

Great advice here.  I’ve heard too often of people looking at the name “Azure SQL Data Warehouse” and figuring that because they have data warehouses on-prem, this is the appropriate analog.  Azure SQL DW is not a typical data warehousing environment; it’s more of a specialized tool than that, so click through to see if it fits your needs.

Comments closed

Automating Azure Data Lake Storage ACLs

Shannon Lowder shows how to automate Azure Data Lake Storage access control lists:

Now that you have these, you can use a for each loop to set your permissions.

foreach ($ACL in $ACLs) { write-host "Grant $useremail " $ACL[1] " access to " $ACL[0]; Set-AzureRmDataLakeStoreItemAclEntry -AccountName $adls -Path $ACL[0] -AceType User -Id $(Get-AzureRmADUser -Mail $useremail ).Id -Permissions $ACL[1] Set-AzureRmDataLakeStoreItemAclEntry -AccountName $adls -Path $ACL[0] -AceType User -Id $(Get-AzureRmADUser -Mail $useremail ).Id -Permissions $ACL[1] -Default
}

Now, for each permission, we’ll set the ACL and the default.  Why set both?  Well, when folders are created under each of the target folders, you want to cascade those permissions down from parent to child, right?  Well, that’s what the Default ACL controls.  If you skip the second Set-AzureRMDataLakeStoreItemAclEntry, then new folders would not inherit the permissions of the containing folder and your users would be unable to access their files properly.

Read the whole thing.  Shannon also has one of the very few valid use cases for 3D pie charts.

Comments closed

Cosmos DB Cheat Sheet

Melody Zacharias shows us a cheat sheet for Cosmos DB:

The Cosmos DB by Microsoft is their globally distributed, horizontally scalable, multi-model database service that is available through Azure.  Released in 2014, it is the ideal DB for globally distributed applications.  Formerly called DocumentDB Cosmos it now supports querying documents using SQL as a JSON query language.  As a Schema-free platform, it provides automatic indexing of JSON documents without requiring an explicit schema or creation of secondary indexes. For those of use not well versed in JSON, this query cheat-sheet, has come to our rescue.  It outlines common queries to retrieve information from 2 JSON documents.

Microsoft has put together this cheat-sheet to help you write your queries faster.  This quick reference is a single page PDF that you can print, or keep in a handy computer file.  This is version 4, so it just keeps getting better!

Click through for the link to the cheat sheet.

Comments closed