Press "Enter" to skip to content

Curated SQL Posts

Building a SQL Cluster: Availability Groups

Ryan Adams continues a series on building a SQL Server cluster as a lab:

You are going to create a multi-subnet Availability Group in Part 3 of our series on how to build a SQL Cluster Lab. First you give the Cluster Name Object rights in Active Directory and then install your AG. You will also ensure the AG endpoints are using the dedicated network created for them in Part 1. Here are links to the other articles in the series.

Click through for the video.

Comments closed

High-Value Software Testing

RJ Zaworski has some principles around high-value testing:

Whatever the tools and approach, software testing proves that software does what it claims to do. Tests help developers eliminate defects, build confidence, practice good design, and ideally all three. They also take time to write, run, and update–time that’s no longer available for other development tasks.

High-value testing seeks to maximize the return on that investment. Like much of software development, it’s as much art as science. But a few practical principles can help keep things pointed in the right direction.

There’s a lot of good advice in this post.

Comments closed

Thoughts on Large Datasets in Power BI

Teo Lachev has some early thoughts on large datasets in Power BI:

At Ignite 2019 Microsoft announced the public preview of large datasets in Power BI Premium. This is a significant milestone as now datasets can grow up to the capacity’s maximum memory (previously, the max size was 10 GB with P3 plan), thus opening the possibility of deploying organizational semantic models to Power BI. I consider this feature mostly suitable for organizational BI as I don’t imagine business users dealing with such large data volumes. I tested large datasets during its private preview, and I’d like to share some notes.

Teo has some open questions, and I’d like to see this shifted down to SSAS too.

Comments closed

Powershell Notebooks in Azure Data Studio

Aaron Nelson announces a new feature in Azure Data Studio:

In order to get all the nice intellisense and tab completion features of the PowerShell language inside your PowerShell Notebooks, be sure to install the PowerShell extension from the Azure Data Studio marketplace.

At this point, the biggest remaining language is R, though I’d love to see F# support as well (hey, Azure Notebooks offers F# support).

Comments closed

High Availability Announcements from Microsoft

Allan Hirt looks at a couple announcements from Microsoft:

I’m going to discuss what I feel are the biggest game changers. I knew licensing was changing as I had conversations with Microsoft around this months ago. I was not sure what the final result was going to be, but I’m fairly pleased. Is it perfect? No, but it’s much better than it was.

You’ll definitely want to read Allan’s thoughts on Microsoft’s SQL Server licensing changes, as well as a private preview of Azure Shared Disks.

Comments closed

Errors with SQL Server TDE and Azure Key Vault

Amit Banerjee takes us through troubleshooting issues when using Azure Key Vault as the key storage mechanism for Transparent Data Encryption:

The first one was a 404 error. When I looked the application event log, I saw the following error:

Operation: getKeyByName
Key Name: ContosoRSAKey0
Message: [error:112, info:404, state:0] The server responded 404, because the key name was not found. Please make sure the key name exists in your vault.

The simple reason for the above error is that I was using an incorrect key name or the key didn’t exist in my Azure Key Vault. So the remediation is to check if the key exists in your Azure Key Vault. If not, then create the key.

Read on for additional errors you might run into, as well as a link to an Azure Data Studio notebook to set this up yourself.

Comments closed

Notes on Wrangling Data Flows

Rayis Imayev calculates distance between two geographical points in an Azure Data Factory Wrangling data flow:

Brian Donovan and Dan Work from the University of Illinois has pointed out that this dataset “contains a large number of errors. For example, there are several trips where the reported meter distances are significantly shorter than the straight-line distance, violating Euclidean geometry“. So, that triggered my interest to add an additional column to this dataset with a straight line distance between two geo-points of pickup and dropoff locations, and that’s where I wanted Wrangling Data Flows to help me.

Read on for Rayis’s demonstration, as well as a long list of observations (positive and negative) about the current state of Wrangling data flows.

Comments closed

Securing Data on ElasticMapReduce

Duncan Chen takes us through data encryption options when using ElasticMapReduce:

Data encryption is an effective solution to bolster data security. You can make sure that only authorized users or applications read your sensitive data by encrypting your data and managing access to the encryption key. One of the main reasons that customers from regulated industries such as healthcare and finance choose Amazon EMR is because it provides them with a compliant environment to store and access data securely.

This post provides a detailed walkthrough of two new encryption options to help you secure your EMR cluster that handles sensitive data. The first option is native EBS encryption to encrypt volumes attached to EMR clusters. The second option is an Amazon S3 encryption that allows you to use different encryption modes and customer master keys (CMKs) for individual S3 buckets with Amazon EMR.

Click through for more details on each.

Comments closed

Databricks + Azure Synapse Analytics

David Meyer and Clinton Ford explain how you can integrate Azure Databricks with Azure Synapse Analytics:

In the last two years since it first became available, thousands of companies have adopted Azure Databricks, making it one of the fastest growing data and AI services on Microsoft Azure. Customers now process over 2 exabytes per month with millions of server-hours spinning up every day. All of this is driven by organizations like ElectroluxShell, and renewables.AI that are using Azure Databricks to process data at massive scale for data science and analytics.

Within this amazing adoption is a specific solution architecture to highlight called the Modern Data Warehouse (MDW). Earlier this year we wrote about the performance and scale benefits of this solution, and part of the pattern’s success has been our close integration to Azure SQL Data Warehouse with a high-performance connector that was jointly engineered to make it fast and easy to move data between the two services.

Something interesting about Synapse is that its implementation of Spark is not the same as the Databricks implementation (perhaps for licensing reasons). But that doesn’t stop us from using Databricks to process and curate data for Synapse Analytics.

Comments closed