Press "Enter" to skip to content

Category: Cloud

Loading Azure SQL Data Warehouse

Grant Fritchey enumerates various pains he went through to load data into Azure SQL Data Warehouse:

Let’s bottom line this. Data is dirty. Any ETL process is likely to expose the trashy data, so what are the real issues here? First up, Powershell is my bestest buddy. Scripting everything through Posh right out of the gate made a huge difference in my ability to constantly reset. The fact that our most common processes spit out UTF16, while not a problem, is something you need to get on top of (see Posh above). AZCopy works really well and the command line is easy to implement, but you’ll need to go through the process of setting up all the appropriate connections. Polybase is wicked cool. Yes, I had to hop through hoops to get the data clean, but, what you should note in the above descriptions, at no point did I need to go and recreate my EXTERNAL TABLE from Azure SQL Data Warehouse. I just kept updating the files through my scripts and as soon as they were loaded up to blob storage, I ran a query against them. No additional work required. That’s nice. You’re going to want to go through your data types on your data before you start the migration. You’re going to have to plan for a lot of time cleaning your data if you haven’t already.

Grant’s method is the right way, especially for early tests.  In practice, for the type of data you’d put into Polybase, you might want to create the external table to allow rejecting a certain number of rows—Grant didn’t specify the REJECT_TYPE and REJECT_VALUE attributes creating his external table, so the default of 0 rows was used.  In a warehouse with billions of rows, hand-fixing all of that data is a nasty proposition, and if you’re writing queries whose results likely won’t change if a few (dozen?) records get dropped, rejecting bad data might be a good way to keep some of your sanity.

Comments closed

SQL Server Express Images In Azure

Microsoft now offers an Azure image for SQL Server Express:

We just announced that we added images for SQL Server Express with Tools 2014, 2012, and 2008R2 in the Azure Gallery. SQL Server Express is a free version of SQL Server that you can use for dev/test and for web and mobile apps with lightweight relational database needs.

I’m not the world’s biggest fan of Express edition, but if you’re cost-conscious enough, this might be a nice move for you.

Comments closed

Scaling Azure SQL Database

James Serra has a post on scaling Azure SQL Database:

Horizontal scaling refers to adding or removing databases in order to adjust capacity or overall performance. This is also called “scaling out”.  Sharding, in which data is partitioned across a collection of identically structured databases, is a common way to implement horizontal scaling.

Vertical scaling refers to increasing or decreasing the performance level of an individual database—this is also known as “scaling up.”

It’s not free and application changes might be required (especially for horizontal scaling), but scaling with Azure SQL Database is pretty straightforward.

Comments closed

HDInsight + Power BI + Spark

Reza Rad has a nice walkthrough on integrating several powerful technologies:

Power BI can connect to many data sources as you know, and Spark on Azure HDInsight is one of them. In area of working with Big Data applications you would probably hear names such as Hadoop, HDInsight, Spark, Storm, Data Lake and many other names. Spark and Hadoop are both frameworks to work with big data, they have some differences though. In this post I’ll show you how you can use Power BI (either Power BI Desktop or Power BI website) to connect to a sample of Spark that we built on an Azure HDInsight service. by completing this section you will be able to create simple spark on Azure HDInsight, and run few Python scripts from Jupyter on it to load a sample table into Spark, and finally use Power BI to connect to Spark server, load, and visualize the data.

If you’re totally unfamiliar with Spark but interested in data processing, now’s a good time to start digging into the topic.

Comments closed

Migrating To Azure SQL Database

James Serra has a good post on moving your on-premise SQL Server instance up to Azure SQL Database:

In this migration process you migrate both your schema and your data from the SQL Server database in your current environment into SQL Database, provided the existing database passes compatibility tests.  Fortunately with SQL Database Version 12 (V12), there are very few remaining compatibility issues other than server-level and cross-database operations (see Azure SQL Database Transact-SQL differences).  Databases and applications that rely on partially or unsupported functions will need some re-engineering to fix these incompatibilities before the SQL Server database can be migrated.

Two years ago, I would have laughed at the idea.  Right now, I’m skeptical.  My expectation is that, two years from now, this will be my default answer for non-sensitive data.

Comments closed

Azure Blob Storage Sync Updated

Randolph West reports on some bug fixes to Azure Blob Storage Sync:

During a SQL Server migration this month, I found some inconsistencies in my Azure Blob Storage Sync tool, so I made several improvements, and fixed an outstanding bug.

As you know, it relies on the naming convention provided in Ola Hallengren’s Maintenance Solution and comes in two parts: the AzureBlobStorageSync command-line application, and the AzureBlobStorageRestore command-line application.

Using Azure blob storage (or S3 if you go the Amazon way) as a long-term storage mechanism for database backups is a pretty smart idea.

Comments closed

HA/DR With Azure SQL Database

James Serra looks into high availability and disaster recovery options for Azure SQL Database:

When you use the Azure portal to create a SQL Database, the various plans under the pricing tier include three service tiers: Basic, Standard, and Premium.  Here are those three plans with their high-availability (HA) and disaster recovery (DR) options:

Considering the price point, Microsoft offers some pretty good HA & DR capabilities for Azure SQL Databases.

Comments closed

Restoring An Azure SQL Database

Grant Fritchey shows us how to restore a database hosted in Azure SQL Database:

The first, and most important thing to notice here is that it’s supplying me with a new name. I can change that to anything I want as long as it’s not the name of a database already in existence on my Azure SQL Database Server. You read that correctly, you can restore a database to Azure SQL Database, but there is no WITH REPLACE option. The restore creates a new database. This is important. In a recovery scenario, you need to be sure that you’re prepared to deal with this fact. How could you replace the existing database? Immediately run a pair of ALTER DATABASE commands to change the name of the existing database to something else and then change the name of your newly created database to the old name. That’s your choice.

There are a couple of gotchas, so if you are administering Azure SQL Database instances, be aware of these.

Comments closed

Change Azure SQL Database Compatibility Level

Tom LaRock shows us how to change the compatibility level of an Azure SQL Database:

You can change the compatibility level of an Azure SQL Database.

It’s true! I know!

OK, so I’m a little excited about this one. See, I’ve been giving this talk on cardinality for the past couple of years now, so this is a hidden gem to me. When I found out this was possible I took out my demo scripts to see if changing the compatibility level would have any effect.

This is interesting, especially given that Management Studio doesn’t give you that option.  Know your T-SQL, folks.

Comments closed

Why Data Lakes?

James Serra explains why you might want to use a data lake:

To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format.  Data is not structured or governed on its way into the data lake.  This eliminates the upfront costs of data ingestion, especially transformation.  Once data is in the lake, the data is available to everyone.  You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it.  Data governorship happens on the way out instead of on the way in.  This makes a data lake very efficient in processing huge volumes of data.  Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.

I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more.  Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.

Comments closed