The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams:

Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based identity and access management service. Members of an organization have a user account that can sign in to various services. AAD is used to access Office 365, Power BI, and Dynamics 365, as well as the Azure portal. It can also be used to grant access and permissions to specific Azure resources.

Meagan has several of these, so check it out.

AzureR Packages In Cran

David Smith points out that the Azure packages for R are now in CRAN:

The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the Github repositories. Updated versions of these packages will also be posted to CRAN, so you can get the latest versions simply by running update.packages.

Read on for a summary of those packages.

Data Transformation Tools In The Azure Space

James Serra gives us an overview of the major tools you would use for ETL and ELT in Azure:

If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake.  And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL).  Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data.  There is good news and bad news when it comes to which product to use.  The good news is there are a lot of products to choose from.  The bad news is there are a lot of products to choose from :-).  I’ll try to help your decision-making by talking briefly about most of the Azure choices and the best use cases for each when it comes to transforming data (although some of these products also do the Extract and Load part

The only surprise is the non-mention of Azure Data Lake Analytics, and there is a good conversation in the comments section explaining why.

Creating Cosmos DB Indexes

Hasan Savran explains indexing in Cosmos DB:

In SQL Server you need to pick which columns you like to index, In CosmosDB you need to pick which columns not to index. It’s kind of same thing at the end. You might ask “If everything is indexed and working fine, why do you want me to poke the well running system?” When we compare SQL Server indexes to CosmosDB Indexes, one thing works exactly same. That is the index file size. CosmosDB holds the indexes in a separate file like SQL Server and if we want to index everything, index file size is going to get large. Since we need to pay for the file space in CosmosDB, you might need to pay extra for indexes that you might never use. Also, your updates, inserts and deletes might cost you more Request Units since CosmosDB needs to maintain all the indexes in the background.

There’s just enough difference to make you pay the price if you assume Cosmos DB works just like SQL Server.

Query Tuning In CosmosDB

Hasan Savran explains how we can tune queries in CosmosDB:

This is most common question in my talks about Cosmos DB from DBAs. Cosmos DB is a managed database, this does not mean that you cannot tune up your queries. But the way you tune up the queries is nothing like SQL Server.

       First you need to be sure that you configured your Cosmos DB containers right. What do I mean with that? You should pick the right partition key before you start to tune up any of your queries. Tuning up your queries is not going to help you in long run if you selected a wrong partition key when you created Cosmos DB containers. Throughput value is another value you need to worry about, the good news about the throughput is, you can change it if you need to. You cannot change your partition key!

It’s a whole different world over there.

Using AzCopy To Sync Data

Randolph West is pleased with an update to AzCopy:

As of November 2018 however, the v10 preview of AzCopy is vastly improved. Firstly, it runs cross-platform on Windows, Linux and macOS (it is open-source, and appears to be written in Go). Secondly, it has more sensible command-line switches. Thirdly, and most importantly in my mind, it includes the much-awaited sync option. This has been a long time coming.

Click through for a demonstration of this synchronization option.

Choosing Azure Data Lake Analytics Versus Azure Databricks

Ginger Grant helps us make the decision between using Azure Data Lake Analytics and Azure Databricks:

Databricks is a recent addition to Azure that is greatly influencing the technology choices that people are making when determining how to process data.  Prior to the introduction of Databricks to Azure in March of 2018, if you had a lot of unstructured data which was stored in HDFS clusters, and wanted to analyze it in a scalable fashion, the choice was Data Lake and using USQL with Data Lake Analytics.  With the introduction of Databricks, there is now a choice for analysis between Data Lake Analytics and Databricks for analyzing data.

Click through for the comparison.

The State Of Database Scoped Configurations

Niko Neugebauer takes us through the current state of Database Scoped Configurations in SQL Server:

I have already blogged about the first version of the Database Scoped Configurations for SQL Server 2016, with 4 visible optionsplus the procedure cache cleaning option, but we have followed in SQL Server 2017 with 5 (listed) & 9 (in practice – DISABLE_INTERLEAVED_EXECUTION_TVF, DISABLE_BATCH_MODE_ADAPTIVE_JOINS, BATCH_MODE_MEMORY_GRANT_FEEDBACK, BATCH_MODE_ADAPTIVE_JOINS are visible and functioning), and in just another year we have received a huge upgrade to the currently available 21 for SQL Server 2019.

It seems like this is a common route the SQL Server teams are going down, and it makes sense: your settings for Mega-DB probably shouldn’t be the same as for the tiny database in the corner. Oh, and that whole Azure SQL Database thing.

Creating An Azure Storage Account

John Morehouse walks us through setting up an Azure Storage Account through the Azure Portal:

Azure offers a lot of features that enable IT professionals to really enhance their environment.  One feature that I really like about Azure is storage accounts.  Since disk is relatively cheap, this continues to hold true in the cloud.  For less than $100 per month, you could get up to 5TB of storage including redundancy to another Azure region.

Read on to learn how to set up one of these.

Switching Azure Portal Accounts

John Morehouse is happy with a change to the Azure Portal:

This means that I could have multiple email accounts that I have to use in order to sign into the portal.  Using a password manager such as 1Password, not usually a big deal and more of an annoyance rather than a headache.
Within the past month or so, Microsoft has updated the portal to allow me to easily switch accounts.  Previously you had to log out of the portal and then log back in.

This is quite convenient. Prior to this change, switching to a different account could goof with other sites I had open (like if I was sending an Outlook e-mail through one account, switching the Azure Portal signed-in account would log me out from Outlook). It’s still not a perfect experience but it’s a lot better.


January 2019
« Dec