Press "Enter" to skip to content

Category: Cloud

Copying Data With Data Factory

Ginger Grant shows how to copy data from an Azure SQL Database to Azure Blob Storage using Data Factory:

Because we need a connection to a database and a Azure Blob, two Linked Services are required, one for each different type. Prior to completing this step, create an Azure Blob storage account by clicking on Add on All Resources. Create the second Linked service, like the first. Click on New data store then select Azure Storage. Using the template for an Azure Blob Storage linked services, I have modified it below adding the “hubName” as it is required

There’s a lot of JSON to write here, if you’re into that sort of thing.

Comments closed

Optimizing HBase In HDInsight

Ashish Thapliyal links to a 30-minute presentation on HBase optimization:

This session was presented by Nitin Verma (Sr. Software Engineer) and Pravin Mittal (Principal Engineering Manager) @ HBaseCon 2016. The session goes deeper into success story of enabling a big internal customer on HDInsight HBase.

HBase design is a totally different mindset from relational design, so you have to unlearn a lot of habits when moving over to it.

Comments closed

Azure ML Updates

David Smith walks us through new language engines supported in Azure ML:

ML studio now gives you even more flexibility, with new language engines supported in the language modules. Within the Execute Python Script module, you can now choose to use Python 2.7.11 or Python 3.5, both of which run within the Acaconda 4.0 distribution. And within the Execute R Script module, you can now choose Microsoft R Open 3.2.2 as your R engine, in addition to the existing CRAN R 3.1.0 engine. Microsoft R Open 3.2.2 not only gives you a newer R language engine, it also gives you access to a wealth of new R packages for use within ML Studio. Over 400 packages are pre-installed for use with the R Script module, and you can install and use any other R package (including CRAN packages and your own R packages) via the Script Bundle input port.

I’m interested in the Microsoft R Open language support, as Azure ML’s still using a relatively older version of R (3.1.0).

Comments closed

vNet Peering Within An Azure Region

Denny Cherry reports that there is a public preview of a feature to allow vNet peering without setting up a site-to-site VPN connection:

Up until August 1st if you had 2 vNets in the same Azure region (USWest for example) you needed to create a site to site VPN between them in order for the VMs within each vNet to be able to see each other.  I’m happy to report that this is no longer the case (it is still the default configuration).  On August 1st, 2016 Microsoft released a new version of the Azure portal which allows you to enable vNet peering between vNets within an account.

Now this feature is in public preview (aka. Beta) so you have to turn it on, which is done through Azure PowerShell. Thankfully it uses the Register-AzureRmProviderFeature cmdlet so you don’t need to have the newest Azure PowerShell installed, just something fairly recent (I have 1.0.7 installed). To enable the feature just request to be included in the beta like so (don’t forget to login with add-AzureRmAccount and then select-AzureRmSubscription).

Read the whole thing for details on how to enroll in this feature and how to set it up.

Comments closed

Scaling Azure Data Warehouse

Vincent-Philippe Lauzon looks at how Azure Data Warehouse scales:

Which data gets stored in which database?

As long as you are doing simple select on one table and that your data is distributed evenly, you shouldn’t care, right?  The query will flow to the compute nodes, they will perform the query on each database and the result will be merged together by the control node.

But once you start joining data from multiple tables, ADW will have to swing data around from one database to another in order to join the data.  This is called Data Movement.  It is impossible to avoid in general but you should strive to minimize it to obtain better performance.

This is a look primarily at the underlying mechanics rather than testing a particular load.  Check it out.

Comments closed

Azure Data Lake ACLs

Saveen Reddy introduces file and folder level Access Control Lists for Azure Data Lake storage:

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs.  Before we five deep into the details on the ACL model, here are key points to remember.

  • POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can rad files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.

  • ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

This is an interesting development.

Comments closed

Getting Started With Azure Data Factory

Ginger Grant provides an introduction to the Azure Data Factory dashboard:

Pick the location based on two factors, Azure Data Factory is not available everywhere so you are limited to use only the ones where it is available. If you pick one where it isn’t available, you will get an error message letting you know why you cannot create the resource. Whenever possible within Azure to pick the same resource where your data lives. There are charges within Azure if you migrate data across resources and no charge if you stay in the same resource. You may want to go look at where the data lives which will be used in Data Factory before deciding where to put it. I always check the Pin to Dashboard option so that I can find the resource later, but it is not required and can be done later. Click on the create button to create a Data Factory Resource. If you have selected Pin to Dashboard you will see a little window which says Deploying Data Factory. This little window goes away once Data Factory is completed, and you will have an entry in the list of resources for Data Factory.

Read the whole thing if you’re thinking of getting started with Azure Data Factory.

Comments closed

Amazon RDS Supports Native Backup And Restore

Brent Ozar reports that Amazon now allows you to use native SQL Server backups for getting data in and out of RDS:

Until now, the single biggest problem has been that both Azure SQL DB and Amazon RDS SQL Server don’t give you access to backup files. If you wanted to get your data out, you were hassling with things like import/export wizards, BCP, or sync apps.

Today, Amazon RDS SQL Server announced support for native backup/restore to Amazon S3.

This is a really, really, really big deal, something Azure SQL DB doesn’t support (and I dearly wish it did). I get even more excited reading this because now Microsoft has to do it in order to remain competitive, and that’ll make Azure SQL DB a much more attractive product for traditional DBAs.

This makes the migration strategy to and from RDS significantly easier.  Brent gives a few examples of how this will help customers.

Comments closed

Migrating To Azure SQL Database

John Sterrett discusses ways to migrate an on-prem database to Azure SQL Database:

This is great, except for the case I want to talk about today. What if you need to do a live migration with as little downtime (business wants no downtime) as possible with bigger databases. For example, say and existing 50 GB to 500GB database? Your only, option today is a good old friend of mine called transactional replication. You see, you can configure transactional replication and have the snapshot occur and all data in your current production system can be syncing live with your Azure SQL Database until it’s time to cutover which will make your cutover downtime as short as possible.

Below I will give you step by step instructions on how you can configure your subscriber. This would be your Azure SQL Database. The publisher would be your existing production database which could either be on-premise or an Azure VM.

Ah, replication:  the cause of, and solution to, all of life’s problems.  Or something.  Do read the whole thing.

Comments closed

Azure Iaas Disk Differences

Rolf Tesmer looks at whether to use SSDs or Premium Disk for SQL Server instances on Azure IaaS:

To test the throughput I will run a set test with the TempDB database on D:\ (local SSD) and then rerun the test again with the TempDB moved onto F:\ (P30 premium disk).  Between the tests SQL Server is restarted so we’re starting with a clean cache and state.

The test SQL script will create a temporary table and then run a series of insert, update, select and delete queries against that table.  We’ll then capture and record the statistics and time.

The results were interesting; read on to learn more.

Comments closed