During a SQL Server migration this month, I found some inconsistencies in my Azure Blob Storage Sync tool, so I made several improvements, and fixed an outstanding bug.
As you know, it relies on the naming convention provided in Ola Hallengren’s Maintenance Solution and comes in two parts: the AzureBlobStorageSync command-line application, and the AzureBlobStorageRestore command-line application.
Using Azure blob storage (or S3 if you go the Amazon way) as a long-term storage mechanism for database backups is a pretty smart idea.
When you use the Azure portal to create a SQL Database, the various plans under the pricing tier include three service tiers: Basic, Standard, and Premium. Here are those three plans with their high-availability (HA) and disaster recovery (DR) options:
Considering the price point, Microsoft offers some pretty good HA & DR capabilities for Azure SQL Databases.
The first, and most important thing to notice here is that it’s supplying me with a new name. I can change that to anything I want as long as it’s not the name of a database already in existence on my Azure SQL Database Server. You read that correctly, you can restore a database to Azure SQL Database, but there is no WITH REPLACE option. The restore creates a new database. This is important. In a recovery scenario, you need to be sure that you’re prepared to deal with this fact. How could you replace the existing database? Immediately run a pair of ALTER DATABASE commands to change the name of the existing database to something else and then change the name of your newly created database to the old name. That’s your choice.
There are a couple of gotchas, so if you are administering Azure SQL Database instances, be aware of these.
You can change the compatibility level of an Azure SQL Database.
It’s true! I know!
OK, so I’m a little excited about this one. See, I’ve been giving this talk on cardinality for the past couple of years now, so this is a hidden gem to me. When I found out this was possible I took out my demo scripts to see if changing the compatibility level would have any effect.
This is interesting, especially given that Management Studio doesn’t give you that option. Know your T-SQL, folks.
To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format. Data is not structured or governed on its way into the data lake. This eliminates the upfront costs of data ingestion, especially transformation. Once data is in the lake, the data is available to everyone. You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it. Data governorship happens on the way out instead of on the way in. This makes a data lake very efficient in processing huge volumes of data. Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.
I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more. Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.
DocumentDB organizes documents into collections, with each database capable of hosting one or more collection. Because DocumentDB is a cloud service, it offers quick and easy implementations, while delivering the flexibility and scalability necessary to meet the demands of todays web and mobile applications.
Read the whole thing if you’re interested in Microsoft’s competitor to MongoDB.
Detecting fraudulent transactions is a key applucation of statistical modeling, especially in an age of online transactions. R of course has many functions and packages suited to this purpose, including binary classification techniques such as logistic regression.
If you’d like to implement a fraud-detection application, the Cortana Analytics gallery features an Online Fraud Detection Template. This is a step-by step guide to building a web-service which will score transactions by likelihood of fraud, created in five steps
Read through for the five follow-up articles. This is a fantastic series and I plan to walk through it step by step myself.
I’ve made a quick video to demonstrate how it works. By the way, you can just type your questions instead of speaking them to Cortana. Questions are sent to the Power BI Q&A feature for the datasets you chose to integrate from your subscription.
Check out the video. I want Jarvis within 10 years, people.
Using Azure ML and a free subscription to the Text Analytics API, I’m going to show you how to perform sentiment analysis and key phrase extraction on tweets with the hashtag #Colts (after this past Sunday’s 51-16 beat down of the Colts at the hands of the Jacksonville Jaguars, I’m bathing in the tears of Colts fans. Watch the highlights! ). Although my example here is somewhat humorous, the steps can be used to perform sentiment analysis and key phrase extraction on any text data as long as you can get the data into Power Query.
This is a fantastic example of how Azure ML can be used. Read the whole thing.
Once sysprep was done, I needed to find a way to get the VMDK files converted to VHDs. A blog post turned me on to StarWind Software’s V2V Converter. It’s a free tool which allows you to convert virtual hard drive files from one format to another. Installing this tool let me turn my set of VMDKs into one 45GB VHD. One note is that, at least on my machine, I needed to run the V2V Converter from a command prompt; executing the app directly from the Start menu would cause the app to appear for a moment and then disappear, as though some error killed the program. The tool installs by default in “%programfiles(x86)%\StarWind Software\StarWind V2V Image Converter\StarV2V.exe” From there, I just needed to get that big image into Azure.
This VM is really a Plan C or Plan D for me, but it’s good to have layers of redundancy.