The question is what is the right time period to use? The answer is it depends on the size of your partitions. Generally, for managed tables in U-SQL, you want to target about 1 GB per partition. So, if you are bringing in say 800 mb per day then daily partitions are about right. If instead you are bringing in 20 GB per day, you should look at hourly partitions of the data.
In this post, I’d like to take a look at two common scenarios that people run into. The first is full re-compute of partitions data and the second is a partial re-compute of a partition. The examples I will be using are based off of the U-SQL Ambulance Demo’s on Github and will be added to the solution for ease of your consumption.
The ability to reprocess data is vital in any ETL or ELT process.
Let’s have a look at the biggest news: DocumentDB now has protocol support for MongoDB. From the industry perspective, this is great news. MongoDB is one of the most easily-recognised NoSQLs databases. Now that DocumentDB can be a great supporting act for MongoDB, it means that architects have a broader range of tools to support business needs in the enterprise data architecture, whilst increasing business capability using the Azure cloud. How does it work? Well, or MongoDB at the wire protocol level, which means that the existing MongoDB drivers will function against DocumentDB. For IT departments, it means that enterprises can persist data in DocumentDB behind the scenes. With it, it brings the reliability, stability and resilience of the Azure cloud. It also means that these technologies are accessible for small to medium enterprises in a way that they can afford, backed up with the stability and support from Microsoft Azure that they may find difficult to service on their own.
DocumentDB has a long way to go before it catches up to MongoDB, but these are improvements that close the gap a little bit.
Azure Data Catalog is a Software as a Service (SaaS) offering in Azure, part of the Cortana Intelligence Suite, for registering metadata about data sources. Check this post for an overview of Azure Data Catalog key features. (I’m a big fan of what Azure Data Catalog is trying to accomplish.)
There are a couple of particulars about Azure Data Catalog which make it a bit more difficult to set up a Demo/Test/Learning type of environment, including:
You are required to sign into Azure Data Catalog with an organizational account. Signing in with a Microsoft account (formerly known as a Live account) won’t work for Azure Data Catalog authentication, even if that’s what you normally use for Azure.
One Azure Data Catalog may be created per organization. Note this is *not* per Azure subscription – if your account has access to multiple subscriptions, it’s still one catalog per organization.
This method seems, shall we say, overly complicated. Here’s hoping that the Azure Data Catalog team find a way to improve this experience.
DTU’s are explained at here. To help, there is a Azure SQL Database DTU Calculator. This calculator will help you determine the number of DTUs being used for your existing on-prem SQL Server database(s) as well as a recommendation of the minimum performance level and service tier that you need before you migrate to Azure SQL Database. It does this by using performance monitor counters.
After you use a SQL Database for a while, you can use a pricing tier recommendation tool to determine the best service tier to switch to. It does this by assessing historical resource usage for a SQL database.
For further information, check out this interesting article from a few months ago on V12 performance by Chris Bailiss.
You need to start working on adding Azure knowledge to your skill set. If you have access to an MSDN license, getting into Azure is easy because of the credits available. However, not everyone works for a company that provides MSDN or has purchased a license. In that case, getting into Azure, just for testing and learning could be expensive (I frequently “spend” $150/month with my MSDN credits). However, Microsoft is very serious about getting everyone moved into this space. They’ve launched a new free program called IT Pro Cloud Essentials. Not only does it give you Azure credit, but you also get access to O365, another set of skills and understanding you need to acquire.
Also check out their Visual Studio Dev Essentials program. Its Azure credit is only $25 a month, but offers you SQL Server 2014 (and will offer 2016) Developer Edition.
In my Simple-Talk article Azure SQL Data Warehouse, I introduced you to SQL Data Warehouse and gave you an overview of the architecture and technologies that drive the service and make it all work. In this article, I go a step further and provide details about getting started with SQL Data Warehouse, demonstrating how to add a sample database and then accessing the server and database settings.
If you want to follow along with my examples and try out SQL Data Warehouse for yourself, you must have an active Azure subscription, even if it’s just the free trial. For those who have already used up their free trial, be aware that SQL Data Warehouse is a pay-as-you-go service, even though it’s still in preview, so unless you’re on an unlimited company budget or happen to have accrued MSDN credits, you’ll want to be judicious in how you try out the service. Fortunately, as you’ll see in this article, you can pause the compute resources when not in use, helping to minimize the costs associated with learning about the service.
This article is all about initial installation and configuration.
So I set about looking for a workaround. This week I think I’ve finally managed to get something working that approximates the number I need from that view, ms_ticks.
Attached is sp_whoisactive v11.112 — Azure Special Edition v2. Please give it a shot, and I am especially interested in feedback if you use the @get_task_info = 2 option when running sp_whoisactive. That is the main use case that’s impacted by the lack of ms_ticks information and my attempt at a workaround.
If you’re using on-prem SQL Server, this doesn’t add anything new, but if you’re on Azure SQL Database, give it a try.
I do have alerts set up on the Azure portal and in Application Insights to notify me when availability or performance thresholds are violated but I also need to know if there is a global or regional issue that might affect our app so that I can respond and notify the staff when appropriate. Azure status changes are reported on the Azure Status web page.
The following will describe how to use the Azure Status page RSS feeds and Outlook rules for notification if things go sideways in Microsoft Azure.
This is a good use of Outlook’s built-in RSS reader.
In the field, I see a lot of people using Availability Groups to have a near real-time replica for reporting. I talked a little bit about this above. What isn’t mentioned here is you have to maintain a Windows Failover Cluster, Quorum, Active Directory (Unless using Windows 2016 Preview) and more. This gets you a replica that is just a copy of the database. What does this mean? You cannot change database objects like security, indexes, etc. Also, what if you don’t need the whole database(s) for reporting? If not, you can replicate only the data you truly need.
So, let’s recap here. You only have to replicate the data that you need. You can have different security and indexes on your reporting subscriber database(s). The reporting subscriber database can be scaled up or down as needed based on your needs. The reporting database can now be an Azure Database. Folks, I call this a huge win!
There’s a lot more replication love out there than I’d expect. John promises to follow up with a guide on how to implement this, so keep an eye out for that.
As I’m currently planning to migrate the entire BI architecture of one of my customers to the cloud, this made me think: can we ditch SSAS as we know it already in favor of Power BI? What are the alternatives?
To study that, I’ve put some diagrams together to show the possibilities of moving BI to the cloud. First, I’ll discuss the possible architectures, then the impossible architecture (but maybe the situation I was looking for).
One man’s opinion: there will be SSAS for Azure. I have no proof of this, and the nice part about having no proof is that I can throw out wild speculation without fear of violating NDA…. But to me, Power BI solves a different problem and acts in conjunction with SSAS rather than as its replacement. I also don’t see any technical reasons why SSAS couldn’t live in the cloud, and so that leads me to believe that it will be there eventually. But hey, it turns out wild speculation is occasionally wrong…