Press "Enter" to skip to content

Category: Cloud

Basics Of Azure SQL Data Warehouse

Minette Steynberg has an article introducing Azure SQL Data Warehouse:

Azure SQL DW is best used for analytical workloads that makes use of large volumes of data and needs to consolidate disparate data into a single location.

Azure SQL DW has been specifically designed to deal with very large volumes of data. In fact, if there is too little data it may perform poorly because the data is distributed. You can imagine that if you had only 10 rows per distribution, the cost of consolidating the data will be way more than the benefit gained by distributing it.

SQL DW is a good place to consolidate disparate data, transform, shape and aggregate it, and then perform analysis on it. It is ideal for running burst workloads, such as month end financial reporting etc.

Azure SQL DW should not be used when small row by row updates are expected as in OLTP workloads. It should only be used for large scale batch operations.

Azure SQL Data Warehouse is fantastic when you’ve got a setup like above and are willing to pay a premium to make things faster.  And with appropriately distributed data, it certainly does get faster.

Comments closed

Azure SQL Database Compatibility Level Change

Joe Sack reports that new Azure SQL Databases will have a compatibility level of 140 by default:

Once this new database compatibility default goes into effect, if you still wish to use database compatibility level 130 (or lower), please follow the instructions detailed here: View or Change the Compatibility Level of a Database.  For example, you may wish to ensure that new databases created in Azure SQL Database use the same compatibility level as other databases in Azure SQL Database to ensure consistent query optimization behavior across development, QA and production versions of your databases. We recommend that database configuration scripts explicitly designate COMPATIBILITY_LEVEL rather than rely on the defaults, in order to ensure consistent application behavior.

For new databases supporting new applications, we recommend using the latest compatibility level (140).  For pre-existing databases running at lower compatibility levels, the recommended workflow for upgrading the query processor to a higher compatibility level is detailed in the article, Change the Database Compatibility Mode and Use the Query Store.  Note that this article refers to compatibility level 130 and SQL Server, but the same methodology applies for moves to 140 for SQL Server and Azure SQL DB.

It’s good to hear, and as Joe mentions, you have the ability to move back down to 130 if you need it.

Comments closed

Scale-Up In Azure SQL Database

Arun Sirpal walks us through how to bump up the scale of an Azure SQL Database:

We should all be aware that when changing the service tier and/or performance level of a database it creates a replica of the original database at the new performance level and then switches connections over to the replica. No data is lost during this process but during the brief moment when we switch over to the replica, connections to the database are disabled, so some transactions in flight may be rolled back. So it probably best that you do this during a period of low activity.

It’s pretty easy to do, but heed Arun’s warning, as you can cause queries to fail during switch-over.  My experience has been that scaling up is pretty slow, too:  I seem to recall it taking 10-20 minutes for a moderately sized database.

Comments closed

Automating Index Maintenance On Azure SQL DB

Arun Sirpal shows how to use Azure Automation to rebuild indexes on an Azure SQL Database database:

The answer is via Azure Automation.

At a high level this is what I did.

  • Create an Automation Account.

  • Create a credential.

  • Create a PowerShell Runbook which has the code for index rebuilds.

  • Create a schedule and link it to the above.

  • Configure parameters within the schedule (if any).

  • Configure logging level (if desired).

Click through for the detailed steps.

Comments closed

Building Azure Resource Manager Templates

Ed Elliott gives a brief overview of Azure Resource Manager templates and puts together a sample template:

The ARM API deploys resources to Azure, but doesn’t deploy code onto those resources. For example you can use ARM to deploy a virtual machine with SQL Server already installed but you can’t use ARM to deploy a database from an SSDT DacPac.

To save time when designing solutions, it is important to understand that ARM API is used simply for resources and we need to use some other technology such as DSC or PowerShell to manage the deployments onto the infrastructure once it is deployed.

This is a nice overview of the topic, and because it’s Ed (who is much better about this than most), he goes into how to test before even getting into how to create.

Comments closed

Taking Control In The Cloud

Arun Sirpal advises you to enjoy the change in control when moving to Azure:

The key for me has been to “embrace the change”. I have come from a traditional DBA background, backups, consistency checks, server level configuration at the heart of things. Yes, I no longer care about SQL backups and things of that nature when operating within Azure but I have different tasks and to think about.

Sceptical? Please read on.

Do read on.

Comments closed

Analyzing Twitter Data With Storm In HDInsight

Nischal S shows how to configure an HDInsight cluster to process tweets, followed by loading them into a Power BI dashboard:

When we need to process streams of real-time data, Storm is a great contender. Examples of streaming data are the number of consumer clicks and navigations on a website, IIS or user logs, IoT data, and social network information. In all these scenarios, we use real-time data processing. Apache Storm can process real-time unbounded streams of data.

The term “unbounded” defines streams of data with no start or end. Here, the processing of data is continuous and in real-time. Twitter is a good example. Twitter data is continuous, has no start or end time, and is provided in real-time by millions of Twitter users around the world.

Storm wouldn’t rank in my top three technologies for doing this, but it certainly does the job.

Comments closed

Generating U-SQL Extract Scripts From Visual Studio

Yanan Cai shows a GUI for creating U-SQL EXTRACT scripts via Azure Data Lake Tools for Visual Studio:

One of U-SQL’s core capabilities is to be able to schematize unstructured data on the fly without having to create a metadata object for it. This capability is provided by the EXTRACT expression that will invoke either a user-defined extractor or built-in extractor to process the input file or set of files specified in the FROM clause and produces a rowset whose schema is specified in the EXTRACT clause.

While using the build-in extractor to schema semi-structured data, like data in .csv file, the schema definition in U-SQL is slow and error prone, especially for the .csv file contains hundreds of columns.

Recently, we released a new feature in the latest version of Azure Data Lake Tools for Visual Studio to help you generate this U-SQL EXTRACT statement automatically.

Click through for an example as well as a video showing the process.

Comments closed

Cloudera Director And AWS Spot Instances

David Han shows off some new features in Cloudera Director 2.5 to help when building Hadoop clusters on AWS spot price instances:

You can configure Spot instances in Cloudera Director’s instance templates. These instance templates contain a flag indicating whether Spot instances should be used, as well as a field specifying the bid price for those instances.

Each instance group in the cluster template includes a field that indicates the minimum number of instances required in that group for the cluster to be considered successful. Cloudera Director will continue with bootstrapping or growing a cluster if the minimum count for each instance group is satisfied. Spot instances should not be used for instance groups that are required for the normal operation of the cluster, such as HDFS DataNodes. Instance groups configured to use Spot instances should set their minimum number to zero with the expectation that the instances may not be provisioned due to the Spot bid price being lower than the Spot price.

If you’re able to take advantage of spot instances, you can end up saving a pretty good amount of money.

Comments closed

Creating A WSFC Witness On Azure

Jim Donahoe shows how to create a cloud witness for a Windows cluster using Azure:

Well, that is easy to fix, right?  Let’s just spin up a VM in Azure, and host the FSW on that machine.  Problem solved!  Technically yes, that is a viable option.  But, let’s consider the cost of that scenario in the breakdown below:

  1. VM with OS licensed and Disk space allocated for FSW
  2. NSG/Firewall to protect said resource from outside 
  3. VNET

Also, you have to figure in the man hours in configuring all of those things(Let’s say 4 hours total.  Insert your hourly rate here:  Rate x 4 = Setup fee for VM in Azure

Now, here is where Cloud Witness saves the day!  The Cloud Witness WSFC Quorum type will utilize BLOB Storage in Azure to act as the point of arbitration.  Not sure what that means?

There’s a good walkthrough, but it does look quite easy to do, and a simple blob is going to be a lot cheaper than a VM.

Comments closed