Up until August 1st if you had 2 vNets in the same Azure region (USWest for example) you needed to create a site to site VPN between them in order for the VMs within each vNet to be able to see each other. I’m happy to report that this is no longer the case (it is still the default configuration). On August 1st, 2016 Microsoft released a new version of the Azure portal which allows you to enable vNet peering between vNets within an account.
Now this feature is in public preview (aka. Beta) so you have to turn it on, which is done through Azure PowerShell. Thankfully it uses the Register-AzureRmProviderFeature cmdlet so you don’t need to have the newest Azure PowerShell installed, just something fairly recent (I have 1.0.7 installed). To enable the feature just request to be included in the beta like so (don’t forget to login with add-AzureRmAccount and then select-AzureRmSubscription).
Read the whole thing for details on how to enroll in this feature and how to set it up.
Which data gets stored in which database?
As long as you are doing simple select on one table and that your data is distributed evenly, you shouldn’t care, right? The query will flow to the compute nodes, they will perform the query on each database and the result will be merged together by the control node.
But once you start joining data from multiple tables, ADW will have to swing data around from one database to another in order to join the data. This is called Data Movement. It is impossible to avoid in general but you should strive to minimize it to obtain better performance.
This is a look primarily at the underlying mechanics rather than testing a particular load. Check it out.
We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs. Before we five deep into the details on the ACL model, here are key points to remember.
POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can rad files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.
ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.
This is an interesting development.
Pick the location based on two factors, Azure Data Factory is not available everywhere so you are limited to use only the ones where it is available. If you pick one where it isn’t available, you will get an error message letting you know why you cannot create the resource. Whenever possible within Azure to pick the same resource where your data lives. There are charges within Azure if you migrate data across resources and no charge if you stay in the same resource. You may want to go look at where the data lives which will be used in Data Factory before deciding where to put it. I always check the Pin to Dashboard option so that I can find the resource later, but it is not required and can be done later. Click on the create button to create a Data Factory Resource. If you have selected Pin to Dashboard you will see a little window which says Deploying Data Factory. This little window goes away once Data Factory is completed, and you will have an entry in the list of resources for Data Factory.
Read the whole thing if you’re thinking of getting started with Azure Data Factory.
Until now, the single biggest problem has been that both Azure SQL DB and Amazon RDS SQL Server don’t give you access to backup files. If you wanted to get your data out, you were hassling with things like import/export wizards, BCP, or sync apps.
This is a really, really, really big deal, something Azure SQL DB doesn’t support (and I dearly wish it did). I get even more excited reading this because now Microsoft has to do it in order to remain competitive, and that’ll make Azure SQL DB a much more attractive product for traditional DBAs.
This makes the migration strategy to and from RDS significantly easier. Brent gives a few examples of how this will help customers.
This is great, except for the case I want to talk about today. What if you need to do a live migration with as little downtime (business wants no downtime) as possible with bigger databases. For example, say and existing 50 GB to 500GB database? Your only, option today is a good old friend of mine called transactional replication. You see, you can configure transactional replication and have the snapshot occur and all data in your current production system can be syncing live with your Azure SQL Database until it’s time to cutover which will make your cutover downtime as short as possible.
Below I will give you step by step instructions on how you can configure your subscriber. This would be your Azure SQL Database. The publisher would be your existing production database which could either be on-premise or an Azure VM.
Ah, replication: the cause of, and solution to, all of life’s problems. Or something. Do read the whole thing.
To test the throughput I will run a set test with the TempDB database on D:\ (local SSD) and then rerun the test again with the TempDB moved onto F:\ (P30 premium disk). Between the tests SQL Server is restarted so we’re starting with a clean cache and state.
The test SQL script will create a temporary table and then run a series of insert, update, select and delete queries against that table. We’ll then capture and record the statistics and time.
The results were interesting; read on to learn more.
The Spark-Hbase Connector provides an easy way to store and access data from HBase clusters with Spark jobs. HBase is really successful for highest level of data scale needs. Thus, existing Spark customers should definitely explore this storage option. Similarly, if the customers are already having HDinsight HBase clusters and they want to access their data by Spark jobs then there is no need to move data to any other storage medium. In both the cases, the connector will be extremely useful.
I’m not the biggest fan of HBase, but if it’s part of your environment, you should definitely look at this Spark connector.
Now for the big question, Windows or Linux?
That’s absolutely correct.
As you can see, we now have a new path in our query plan with an operator called “Remote Query”. Basically the local server queries the remote query then using the local Primary key Concatenates them back together to produce the desired result. So can we update the data?
Nope, sure can’t. Once the data lives in Azure, the data is READ ONLY.
Check it out. He’s a bit more sanguine about Stretch than I am, so maybe it will fit your use cases.