Power BI can connect to many data sources as you know, and Spark on Azure HDInsight is one of them. In area of working with Big Data applications you would probably hear names such as Hadoop, HDInsight, Spark, Storm, Data Lake and many other names. Spark and Hadoop are both frameworks to work with big data, they have some differences though. In this post I’ll show you how you can use Power BI (either Power BI Desktop or Power BI website) to connect to a sample of Spark that we built on an Azure HDInsight service. by completing this section you will be able to create simple spark on Azure HDInsight, and run few Python scripts from Jupyter on it to load a sample table into Spark, and finally use Power BI to connect to Spark server, load, and visualize the data.
If you’re totally unfamiliar with Spark but interested in data processing, now’s a good time to start digging into the topic.
In this migration process you migrate both your schema and your data from the SQL Server database in your current environment into SQL Database, provided the existing database passes compatibility tests. Fortunately with SQL Database Version 12 (V12), there are very few remaining compatibility issues other than server-level and cross-database operations (see Azure SQL Database Transact-SQL differences). Databases and applications that rely on partially or unsupported functions will need some re-engineering to fix these incompatibilities before the SQL Server database can be migrated.
Two years ago, I would have laughed at the idea. Right now, I’m skeptical. My expectation is that, two years from now, this will be my default answer for non-sensitive data.
During a SQL Server migration this month, I found some inconsistencies in my Azure Blob Storage Sync tool, so I made several improvements, and fixed an outstanding bug.
As you know, it relies on the naming convention provided in Ola Hallengren’s Maintenance Solution and comes in two parts: the AzureBlobStorageSync command-line application, and the AzureBlobStorageRestore command-line application.
Using Azure blob storage (or S3 if you go the Amazon way) as a long-term storage mechanism for database backups is a pretty smart idea.
When you use the Azure portal to create a SQL Database, the various plans under the pricing tier include three service tiers: Basic, Standard, and Premium. Here are those three plans with their high-availability (HA) and disaster recovery (DR) options:
Considering the price point, Microsoft offers some pretty good HA & DR capabilities for Azure SQL Databases.
The first, and most important thing to notice here is that it’s supplying me with a new name. I can change that to anything I want as long as it’s not the name of a database already in existence on my Azure SQL Database Server. You read that correctly, you can restore a database to Azure SQL Database, but there is no WITH REPLACE option. The restore creates a new database. This is important. In a recovery scenario, you need to be sure that you’re prepared to deal with this fact. How could you replace the existing database? Immediately run a pair of ALTER DATABASE commands to change the name of the existing database to something else and then change the name of your newly created database to the old name. That’s your choice.
There are a couple of gotchas, so if you are administering Azure SQL Database instances, be aware of these.
You can change the compatibility level of an Azure SQL Database.
It’s true! I know!
OK, so I’m a little excited about this one. See, I’ve been giving this talk on cardinality for the past couple of years now, so this is a hidden gem to me. When I found out this was possible I took out my demo scripts to see if changing the compatibility level would have any effect.
This is interesting, especially given that Management Studio doesn’t give you that option. Know your T-SQL, folks.
To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format. Data is not structured or governed on its way into the data lake. This eliminates the upfront costs of data ingestion, especially transformation. Once data is in the lake, the data is available to everyone. You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it. Data governorship happens on the way out instead of on the way in. This makes a data lake very efficient in processing huge volumes of data. Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.
I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more. Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.
DocumentDB organizes documents into collections, with each database capable of hosting one or more collection. Because DocumentDB is a cloud service, it offers quick and easy implementations, while delivering the flexibility and scalability necessary to meet the demands of todays web and mobile applications.
Read the whole thing if you’re interested in Microsoft’s competitor to MongoDB.
Detecting fraudulent transactions is a key applucation of statistical modeling, especially in an age of online transactions. R of course has many functions and packages suited to this purpose, including binary classification techniques such as logistic regression.
If you’d like to implement a fraud-detection application, the Cortana Analytics gallery features an Online Fraud Detection Template. This is a step-by step guide to building a web-service which will score transactions by likelihood of fraud, created in five steps
Read through for the five follow-up articles. This is a fantastic series and I plan to walk through it step by step myself.
I’ve made a quick video to demonstrate how it works. By the way, you can just type your questions instead of speaking them to Cortana. Questions are sent to the Power BI Q&A feature for the datasets you chose to integrate from your subscription.
Check out the video. I want Jarvis within 10 years, people.