Press "Enter" to skip to content

Category: Cloud

Azure Data Share

James Serra takes us through a new product announcement:

A brand new product by Microsoft called Azure Data Share was recently announced. It is in public preview. To explain the product in short, any data which resides in Azure storage can be securely shared between a data provider and a data consumer. It does this by copying a snapshot of the data to the consumer’s subscription (called snapshot-based copying, and in the future there will be in-place sharing). It currently supports ADLS Gen1, ADLS Gen2, and Blob storage, and eventually will support Azure Data Explorer, SQL DB, and SQL DW. Check out the Documentation and a video, and then go try it out.

You can share the data with a few clicks as long as the user you are trying to share with has access to an Azure Subscription and storage account. The copying and updating of the data is handled for you using the Microsoft backbone for best performance, and is encrypted during transit. You can specify the frequency at which the data consumers receive updates. It also is a simple way to control, manage, and monitor all of your data sharing.

This is a smart idea. Sharing data between companies is a key requirement in a lot of B2B solutions, yet methods for sharing range from high-development and medium-friction (create your own API) to low-development and high-friction (FTP, e-mail).

Comments closed

Calculated Columns and Memory Usage in Analysis Services

Teo Lachev troubleshoots a customer issue:

Scenario: A client reports a memory spike during processing. They have a Tabular semantic model deployed to Azure Analysis Services. They fully process the model daily. The model normally takes less than 50 GB RAM but during processing, it spikes five times and Azure Analysis Services terminates the processing task complaining that it “reached the maximum allowable memory in our pricing tier”. Normally, fully processing the model should take about twice the memory but five times?

Teo gives us the explanation for this problem as well as a recommendation on how to fix it.

Comments closed

The Uniqueness of Cosmos DB Unique Keys

Hasan Savran explains the scope of unique keys in Cosmos DB:

I wrote about Unique Keys and tried to explain how they work in one of my earlier post. It’s common to use SQL Server’s Primary Key or Unique Indexes to explain Unique Keys of Azure Cosmos DB. If you have a Primary Key in a table in SQL Server, the key you defined cannot be in that table more than once. By adding a unique index or unique constraint in a table, you guarantee that no duplicate values can be in your table. The key word in both of those statement is the TABLE.

Azure CosmosDB Unique Keys do not work like Primary Key or Unique Indexes/Constraints

Read on to learn how Cosmos DB differs.

Comments closed

The Databricks File System

Brad Llewellyn takes us through the Azure Databricks File System:

Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks.  If you haven’t read the previous posts in this series, IntroductionCluster Creation and Notebooks, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on to the core of this post, DBFS.

As we mentioned in the previous post, there are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data.  For this post, we’re going to talk about the storage layer underneath Azure Databricks, DBFS.  Since Azure Databricks manages Spark clusters, it requires an underlying Hadoop Distributed File System (HDFS).  This is exactly what DBFS is.  Basically, HDFS is the low cost, fault-tolerant, distributed file system that makes the entire Hadoop ecosystem work.  We may dig deeper into HDFS in a later post.  For now, you can read more about HDFS here and here.

Click through for more detail on DBFS.

Comments closed

Writing SQL Against Cosmos DB

Hasan Savran shows how to query Cosmos DB documents with SQL:

CosmosDB’s SQL API uses T-SQL like language to query the database. It looks like T-SQL language. You still need to use SELECT, FROM, WHERE clause to retrieve the data you are looking for. It looks similar but it works different because the data structure is different. CosmosDB is not a relational database that means you cannot join other containers.

     Let’s start with SELECT. You can use * to select all properties but just like T-SQL you shouldn’t, pick only what you need. Charge of a Request Units depends on how much data your queries retrieves from containers. ORMs like to select everything and that’s one of the reasons DBAs don’t like them. I know couple of ORM supports CosmosDB. Think twice if you want to use an ORM with CosmosDB. Don’t let an ORM to generate a query for CosmosDB. You will literally pay for it!

Feasel’s Law in force, with a bonus of “ORMs are expensive.”

Comments closed

Monetizing Performance Tuning in the Cloud

Emanuele Meazzo points out a benefit from being in the cloud—performance tuning has a direct monetary tie:

A single step up for a single DB can cost from 10% to 25% more than the previous sizing, which translates in 15K or more when dealing with higher specs databases; remember that this is for a single scale up, on a single DB, on a single year.

It’s easy to understand that if your database and/or queries are not well tuned because “scaling will take care of it eventually”, then, you’ll be secretly bleeding money.

We can quantify performance improvements in terms of resource savings, but tying that to the specific business benefits can be hard because that hardware is a capital expenditure and so much of tuning benefits are “we avoided this worse case” rather than “we actively scaled something down.”

Comments closed

Running Powershell Against your Azure SQL DBs

Joey D’Antoni has a script which finds all of your Azure SQL Database instances and runs a Powershell script against each one in turn:

This code is pretty simple and looks for an Azure SQL Server in each resource group, and then looks for the databases that aren’t master on each server. In this example I’m setting the storage account for Azure Threat Detection, but you could do anything you wanted in that last loop.

Click through for the script. Things like parallelism and error handling are exercises for the reader.

Comments closed

Clustered Columnstore and Azure SQL DB

Arun Sirpal takes us through online clustered columnstore index creation in Azure SQL Database:

What tier do you need to create one of these things? Let’s see.

CREATE CLUSTERED  COLUMNSTORE INDEX cciSales ON [SalesLT].[ProductModelProductDescription] WITH ( ONLINE = ON )

But I get this message, Msg 40536, Level 16, State 32, Line 1
‘COLUMNSTORE’ is not supported in this service tier of the database. See Books Online for more details on feature support in different service tiers of Windows Azure SQL Database.

Read on to see the minimum tier which allows online creation of clustered columnstore indexes.

Comments closed

Another Look at Cosmos DB Indexing

Hasan Savran revises some indexing recommendations based on changes to Cosmos DB:

Lazy indexing used to be an option. It’s not in any CosmosDB documentation anymore. By using Lazy indexing, you could save 20 to 30 percent for Request Units. Just like anything else in life, you get what you pay for when it comes to Lazy indexing. By selecting Lazy indexing, you are saying that eventually Indexes will be updated. If Indexes are not updated, that means your queries might not return all the data since all data might not be indexed yet. Lazy indexing is still an option, nobody talks about it for a good reason. In my opinion, it should be listed as obsolete feature or it should have a better documentation about how it works or why it might not be a good option for your solutions.

     If you use Lazy Indexing to reduce Request Units in your solution, change it to consistent now unless you have a really good reason!

Read on for more advice in this vein.

Comments closed