To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format. Data is not structured or governed on its way into the data lake. This eliminates the upfront costs of data ingestion, especially transformation. Once data is in the lake, the data is available to everyone. You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it. Data governorship happens on the way out instead of on the way in. This makes a data lake very efficient in processing huge volumes of data. Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.
I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more. Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.
DocumentDB organizes documents into collections, with each database capable of hosting one or more collection. Because DocumentDB is a cloud service, it offers quick and easy implementations, while delivering the flexibility and scalability necessary to meet the demands of todays web and mobile applications.
Read the whole thing if you’re interested in Microsoft’s competitor to MongoDB.
Detecting fraudulent transactions is a key applucation of statistical modeling, especially in an age of online transactions. R of course has many functions and packages suited to this purpose, including binary classification techniques such as logistic regression.
If you’d like to implement a fraud-detection application, the Cortana Analytics gallery features an Online Fraud Detection Template. This is a step-by step guide to building a web-service which will score transactions by likelihood of fraud, created in five steps
Read through for the five follow-up articles. This is a fantastic series and I plan to walk through it step by step myself.
I’ve made a quick video to demonstrate how it works. By the way, you can just type your questions instead of speaking them to Cortana. Questions are sent to the Power BI Q&A feature for the datasets you chose to integrate from your subscription.
Check out the video. I want Jarvis within 10 years, people.
Using Azure ML and a free subscription to the Text Analytics API, I’m going to show you how to perform sentiment analysis and key phrase extraction on tweets with the hashtag #Colts (after this past Sunday’s 51-16 beat down of the Colts at the hands of the Jacksonville Jaguars, I’m bathing in the tears of Colts fans. Watch the highlights! ). Although my example here is somewhat humorous, the steps can be used to perform sentiment analysis and key phrase extraction on any text data as long as you can get the data into Power Query.
This is a fantastic example of how Azure ML can be used. Read the whole thing.
Once sysprep was done, I needed to find a way to get the VMDK files converted to VHDs. A blog post turned me on to StarWind Software’s V2V Converter. It’s a free tool which allows you to convert virtual hard drive files from one format to another. Installing this tool let me turn my set of VMDKs into one 45GB VHD. One note is that, at least on my machine, I needed to run the V2V Converter from a command prompt; executing the app directly from the Start menu would cause the app to appear for a moment and then disappear, as though some error killed the program. The tool installs by default in “%programfiles(x86)%\StarWind Software\StarWind V2V Image Converter\StarV2V.exe” From there, I just needed to get that big image into Azure.
This VM is really a Plan C or Plan D for me, but it’s good to have layers of redundancy.
While I waited…and waited for my Utility data to be inserted into my Azure database, I did some poking around to see if it was even possible to restore a local SQL Server 2014 backup to an Azure database. Guess what, I found something (And there was much rejoicing).
On CodePlex, I found a SQL Database Migration Wizard that can be used to restore the database. They even had Migration Wizards for 2008R2, 2012, and 2014. SQL Database Migration Wizard v3.15.6, v4.15.6 and v5.15.6
If you have an MSDN license, go play with this. Even if you don’t, the lowest tier Azure SQL Database instances are free, so there’s no reason not to learn about them.
Once you select the option to create a new server, you see similar options to those you saw in the Azure Management Portal. However, this time you see a server name.
Alas, you can again name your own database servers! As you can see above in the Azure Preview Portal, the server has been created with the provided name. Then if we switch over to the Azure Management Portal, we will see the same.
Choose your names wisely.
Azure has just introduced another tool to help in the fight against SQL injection known as SQL Database Threat Detection. You can go and read all the Microsofty bits there or watch it work in a real live app here.
Firstly, this is threat detection, not prevention. In a nutshell, this feature will tell you when an attack is mounted against your database and in order to do that, the upstream app has to have a vulnerability in it that’s allowing the attack to get that far. Now before you give it a bit of “well that’s pretty useless then”, the main reason this makes sense is that you can go and enable it with a single checkbox tick and it won’t break your things. Plus, even if the code is solid and you have a device or a service like a WAF, this is just one more layer that’s good to have in place. Let’s just jump into it.
This is a useful tool. If you’re using Azure SQL Databases, go forth and activate this.
At the bottom of the portal, there is a New link and a Delete link. These are for creating and deleting databases.
After clicking the New link, I went through a series of screens to create my database.
The first screen asked me for the name of my database and what size database I wanted to create. This is an important step, since it will affect my monthly charges. Remember, I only have $150 in free credits each month. You can go here to see the pricing for the various service tiers and the performance levels. I chose to create the smallest database I could (2 GB, and 5 DTUs). I also created this database on a new SQL Database Server (I kind of have to, since it is the first database).
Both of the products, the On Premises versions and the Azure SQL Database versions are part of the Relational Database family of products. They share a common base, and a common purpose: to work with relational data. They look basically the same, and operate mostly the same, and serve (at their core) very same purposes.
As such I will make sure that all of the scripts that end up in the final book have been validated on the different editions of SQL Server (as I have always done), and have been executed at least once on Azure SQL Database as well. What I won’t do is go into many details of how to connect to an Azure SQL Database, mostly because it takes quite a few pages to do so (I just tech edited a book that covers such details, and I will direct readers to that book for details on how to connect… Peter Carter “Pro SQL Server Admin” http://www.springer.com/gp/book/9781484207116).
We’re already seeing Microsoft move to a cloud-first philosophy, so get in on Azure if you’ve avoided it thus far.