Configuring an Azure Databricks Cluster

Jon Gurgul explains cluster settings in Azure Databricks:

We need compute to run our notebooks and this is achieved by creating a cluster. A cluster is merely a number of Virtual Machines behind the scenes used to form this compute resource. The benefit of Azure Databricks is that compute is only chargeable when on.

Let’s now click the Clusters icon and set up a simple cluster. Once you have loaded the page you can use the “Create Cluster” button.

Click through for an explanation of what each of the settings means.

Azure SQL DB Serverless

Kevin Feasel



John Morehouse takes us through the current public preview of Azure SQL Database Serverless:

Microsoft continues to expand the Azure ecosystem rapidly.  One of the most recent delivered improvements was Azure SQL Database Serverless option.    The new serverless model, which is currently in public preview, provides a compute tier for a single database that scales dynamically with the workload.   This new compute tier is optimal for single database that have unpredictable usage patterns.  Previously, you would provision the compute tier to your Azure SQL Database which allowed to have more granular control on scalability. The serverless tier scaling is effectively controlled by the service itself and will scale when needed.

This looks like it’d be good for a dev server, where the occasional startup cost is fine and there can be significant usage requirement differences based on time—after the devs go home, you’re not likely to need much compute.

DBCC CHECKDB Error on Azure SQL Database

Arun Sirpal explains an error message on Azure SQL Database:

msg 7928, Level 16, State 1, Line 3
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 8921, Level 16, State 3, Line 3
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Read on to see what this means, as well as what it means for you.

Azure Dedicated Hosts in Preview

Kevin Feasel



Mine Tokus covers the benefit of Azure Dedicated Hosts:

Recently introducedAzure Dedicated Host Preview provides single-tenant physical servers that can host one or more virtual machines. With this new hosting model, physical server is dedicated to your organization and capacity isn’t shared with other customers. Physical server-level isolation helps to address security and compliance requirements, brings visibility and control over the server infrastructure and enables significant cost savings and licensing flexibility for SQL Server workloads on Azure VMs.

I think this might get some recalcitrant large companies to be willing to adopt cloud technologies.

Connecting to Redshift from Azure Analysis Services

Gilbert Quevauvilliers shows how we can connect to Amazon Redshift from Azure Analysis Services:

I am busy working with a customer and had a challenge when using Azure Analysis Services to connect to Amazon Redshift via an ODBC connection.

The first issue that I encountered was the following error: OLE DB or ODBC error: [Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application; AWS PROD. This lead me to a few websites and the one that got me to my solution was Tabular: Error while using ODBC data source for Importing Data

Below are the steps on how I installed, configured and got the connection and refresh working.

Read on for those steps.

Azure Data Share

Kevin Feasel


Cloud, Data

James Serra takes us through a new product announcement:

A brand new product by Microsoft called Azure Data Share was recently announced. It is in public preview. To explain the product in short, any data which resides in Azure storage can be securely shared between a data provider and a data consumer. It does this by copying a snapshot of the data to the consumer’s subscription (called snapshot-based copying, and in the future there will be in-place sharing). It currently supports ADLS Gen1, ADLS Gen2, and Blob storage, and eventually will support Azure Data Explorer, SQL DB, and SQL DW. Check out the Documentation and a video, and then go try it out.

You can share the data with a few clicks as long as the user you are trying to share with has access to an Azure Subscription and storage account. The copying and updating of the data is handled for you using the Microsoft backbone for best performance, and is encrypted during transit. You can specify the frequency at which the data consumers receive updates. It also is a simple way to control, manage, and monitor all of your data sharing.

This is a smart idea. Sharing data between companies is a key requirement in a lot of B2B solutions, yet methods for sharing range from high-development and medium-friction (create your own API) to low-development and high-friction (FTP, e-mail).

Centralized View of SQL in Azure

Kevin Feasel



Meagan Longoria shares thoughts on a change to the Azure portal:

Once your Azure SQL resources are created, you can use the new centralized management hub to administer them. Locate the Azure SQL resources blade to see a list of all of your single databases, database servers, elastic pools, managed instances, and virtual machines running SQL.

If your subscription has a lot of resources, this can be a timesaver.

Calculated Columns and Memory Usage in Analysis Services

Teo Lachev troubleshoots a customer issue:

Scenario: A client reports a memory spike during processing. They have a Tabular semantic model deployed to Azure Analysis Services. They fully process the model daily. The model normally takes less than 50 GB RAM but during processing, it spikes five times and Azure Analysis Services terminates the processing task complaining that it “reached the maximum allowable memory in our pricing tier”. Normally, fully processing the model should take about twice the memory but five times?

Teo gives us the explanation for this problem as well as a recommendation on how to fix it.

The Uniqueness of Cosmos DB Unique Keys

Kevin Feasel



Hasan Savran explains the scope of unique keys in Cosmos DB:

I wrote about Unique Keys and tried to explain how they work in one of my earlier post. It’s common to use SQL Server’s Primary Key or Unique Indexes to explain Unique Keys of Azure Cosmos DB. If you have a Primary Key in a table in SQL Server, the key you defined cannot be in that table more than once. By adding a unique index or unique constraint in a table, you guarantee that no duplicate values can be in your table. The key word in both of those statement is the TABLE.

Azure CosmosDB Unique Keys do not work like Primary Key or Unique Indexes/Constraints

Read on to learn how Cosmos DB differs.

The Databricks File System

Brad Llewellyn takes us through the Azure Databricks File System:

Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks.  If you haven’t read the previous posts in this series, IntroductionCluster Creation and Notebooks, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on to the core of this post, DBFS.

As we mentioned in the previous post, there are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data.  For this post, we’re going to talk about the storage layer underneath Azure Databricks, DBFS.  Since Azure Databricks manages Spark clusters, it requires an underlying Hadoop Distributed File System (HDFS).  This is exactly what DBFS is.  Basically, HDFS is the low cost, fault-tolerant, distributed file system that makes the entire Hadoop ecosystem work.  We may dig deeper into HDFS in a later post.  For now, you can read more about HDFS here and here.

Click through for more detail on DBFS.


September 2019
« Aug