From a logical standpoint, working with Azure SQL databases is not very different from setting up a SQL Server instance and database in your environment. There are 3 main components we will need to deploy:
A firewall (server or database)
This is the second part in his series.
What is this all about? It took me a bit of digging, but what it boils down to is that Microsoft made a fundamental change to how things are managed within Azure. You will now find documentation on these two different deployment models: Classic Deployments and Resource Manager Deployments. These two different set of Powershell cmdlets reflect these different models, as anything for Classic Deployments are handled by cmdlets in the Azure and Azure.Storage modules. All the Resource Manager Deployment stuff is handled by the AzureRM* modules.
This is the first in a series and serves as an introduction to the topic.
If you use SQL Backup to URL to backup your databases to Azure blob storage remember that for the container name case is important
Despite all of my Linux love and C-based language tolerance, case sensitivity for filenames and development languages is a relic of a barbaric past. Nevertheless, that’s the minefield we must traverse.
A limitation with Azure SQL database has been its inability to do cross-database SQL queries. This has changed with the introduction of elastic database queries, now in preview. However, it’s not as easy as on-prem SQL Server, where you can just use the three-part name syntax DatabaseName.SchemaName.TableName. Instead, you have to define remote tables (tables outside your current database), which is similar to how PolyBase works for those of you familiar with PolyBase.
For one-off tables you join to frequently, I suppose this isn’t terrible, but it’s certainly less convenient.
You’ll notice that these results are wildly different from those above. What we’re looking is largely a server versus a database, but not completely. I mean that sys.dm_os_wait_stats is showing the waits for the instance on which my primary Azure SQL Database is currently running. Most of those waits are mine, but because it’s part of the management structure of Azure, sys.dm_os_wait_stats shows some information that’s not applicable, directly, to me. The “server” is not really that. It’s a logical container holding your database. There’s a lot more to it under the covers. To get the waits that are absolutely applicable to me and my databases, I have to go to sys.dm_db_wait_stats.
Azure SQL Database is going to behave a bit differently from on-premise SQL Server, so if you’ve got an Azure SQL Database, pay attention to those differences.
Life we be so much easier if we could just trust everyone, but since we can’t we need solid security for our databases. Azure SQL Database has many security features to make you sleep well at night:
Most of these are exactly the same as the on-premise product—at least the SQL Server 2016 version—but it goes to show that Azure SQL Database has grown up quite a bit.
Let’s bottom line this. Data is dirty. Any ETL process is likely to expose the trashy data, so what are the real issues here? First up, Powershell is my bestest buddy. Scripting everything through Posh right out of the gate made a huge difference in my ability to constantly reset. The fact that our most common processes spit out UTF16, while not a problem, is something you need to get on top of (see Posh above). AZCopy works really well and the command line is easy to implement, but you’ll need to go through the process of setting up all the appropriate connections. Polybase is wicked cool. Yes, I had to hop through hoops to get the data clean, but, what you should note in the above descriptions, at no point did I need to go and recreate my EXTERNAL TABLE from Azure SQL Data Warehouse. I just kept updating the files through my scripts and as soon as they were loaded up to blob storage, I ran a query against them. No additional work required. That’s nice. You’re going to want to go through your data types on your data before you start the migration. You’re going to have to plan for a lot of time cleaning your data if you haven’t already.
Grant’s method is the right way, especially for early tests. In practice, for the type of data you’d put into Polybase, you might want to create the external table to allow rejecting a certain number of rows—Grant didn’t specify the REJECT_TYPE and REJECT_VALUE attributes creating his external table, so the default of 0 rows was used. In a warehouse with billions of rows, hand-fixing all of that data is a nasty proposition, and if you’re writing queries whose results likely won’t change if a few (dozen?) records get dropped, rejecting bad data might be a good way to keep some of your sanity.
We just announced that we added images for SQL Server Express with Tools 2014, 2012, and 2008R2 in the Azure Gallery. SQL Server Express is a free version of SQL Server that you can use for dev/test and for web and mobile apps with lightweight relational database needs.
I’m not the world’s biggest fan of Express edition, but if you’re cost-conscious enough, this might be a nice move for you.
Horizontal scaling refers to adding or removing databases in order to adjust capacity or overall performance. This is also called “scaling out”. Sharding, in which data is partitioned across a collection of identically structured databases, is a common way to implement horizontal scaling.
Vertical scaling refers to increasing or decreasing the performance level of an individual database—this is also known as “scaling up.”
It’s not free and application changes might be required (especially for horizontal scaling), but scaling with Azure SQL Database is pretty straightforward.
Power BI can connect to many data sources as you know, and Spark on Azure HDInsight is one of them. In area of working with Big Data applications you would probably hear names such as Hadoop, HDInsight, Spark, Storm, Data Lake and many other names. Spark and Hadoop are both frameworks to work with big data, they have some differences though. In this post I’ll show you how you can use Power BI (either Power BI Desktop or Power BI website) to connect to a sample of Spark that we built on an Azure HDInsight service. by completing this section you will be able to create simple spark on Azure HDInsight, and run few Python scripts from Jupyter on it to load a sample table into Spark, and finally use Power BI to connect to Spark server, load, and visualize the data.
If you’re totally unfamiliar with Spark but interested in data processing, now’s a good time to start digging into the topic.