Press "Enter" to skip to content

Month: November 2019

Thoughts on Large Datasets in Power BI

Teo Lachev has some early thoughts on large datasets in Power BI:

At Ignite 2019 Microsoft announced the public preview of large datasets in Power BI Premium. This is a significant milestone as now datasets can grow up to the capacity’s maximum memory (previously, the max size was 10 GB with P3 plan), thus opening the possibility of deploying organizational semantic models to Power BI. I consider this feature mostly suitable for organizational BI as I don’t imagine business users dealing with such large data volumes. I tested large datasets during its private preview, and I’d like to share some notes.

Teo has some open questions, and I’d like to see this shifted down to SSAS too.

Comments closed

Powershell Notebooks in Azure Data Studio

Aaron Nelson announces a new feature in Azure Data Studio:

In order to get all the nice intellisense and tab completion features of the PowerShell language inside your PowerShell Notebooks, be sure to install the PowerShell extension from the Azure Data Studio marketplace.

At this point, the biggest remaining language is R, though I’d love to see F# support as well (hey, Azure Notebooks offers F# support).

Comments closed

High Availability Announcements from Microsoft

Allan Hirt looks at a couple announcements from Microsoft:

I’m going to discuss what I feel are the biggest game changers. I knew licensing was changing as I had conversations with Microsoft around this months ago. I was not sure what the final result was going to be, but I’m fairly pleased. Is it perfect? No, but it’s much better than it was.

You’ll definitely want to read Allan’s thoughts on Microsoft’s SQL Server licensing changes, as well as a private preview of Azure Shared Disks.

Comments closed

Errors with SQL Server TDE and Azure Key Vault

Amit Banerjee takes us through troubleshooting issues when using Azure Key Vault as the key storage mechanism for Transparent Data Encryption:

The first one was a 404 error. When I looked the application event log, I saw the following error:

Operation: getKeyByName
Key Name: ContosoRSAKey0
Message: [error:112, info:404, state:0] The server responded 404, because the key name was not found. Please make sure the key name exists in your vault.

The simple reason for the above error is that I was using an incorrect key name or the key didn’t exist in my Azure Key Vault. So the remediation is to check if the key exists in your Azure Key Vault. If not, then create the key.

Read on for additional errors you might run into, as well as a link to an Azure Data Studio notebook to set this up yourself.

Comments closed

Notes on Wrangling Data Flows

Rayis Imayev calculates distance between two geographical points in an Azure Data Factory Wrangling data flow:

Brian Donovan and Dan Work from the University of Illinois has pointed out that this dataset “contains a large number of errors. For example, there are several trips where the reported meter distances are significantly shorter than the straight-line distance, violating Euclidean geometry“. So, that triggered my interest to add an additional column to this dataset with a straight line distance between two geo-points of pickup and dropoff locations, and that’s where I wanted Wrangling Data Flows to help me.

Read on for Rayis’s demonstration, as well as a long list of observations (positive and negative) about the current state of Wrangling data flows.

Comments closed

Securing Data on ElasticMapReduce

Duncan Chen takes us through data encryption options when using ElasticMapReduce:

Data encryption is an effective solution to bolster data security. You can make sure that only authorized users or applications read your sensitive data by encrypting your data and managing access to the encryption key. One of the main reasons that customers from regulated industries such as healthcare and finance choose Amazon EMR is because it provides them with a compliant environment to store and access data securely.

This post provides a detailed walkthrough of two new encryption options to help you secure your EMR cluster that handles sensitive data. The first option is native EBS encryption to encrypt volumes attached to EMR clusters. The second option is an Amazon S3 encryption that allows you to use different encryption modes and customer master keys (CMKs) for individual S3 buckets with Amazon EMR.

Click through for more details on each.

Comments closed

Databricks + Azure Synapse Analytics

David Meyer and Clinton Ford explain how you can integrate Azure Databricks with Azure Synapse Analytics:

In the last two years since it first became available, thousands of companies have adopted Azure Databricks, making it one of the fastest growing data and AI services on Microsoft Azure. Customers now process over 2 exabytes per month with millions of server-hours spinning up every day. All of this is driven by organizations like ElectroluxShell, and renewables.AI that are using Azure Databricks to process data at massive scale for data science and analytics.

Within this amazing adoption is a specific solution architecture to highlight called the Modern Data Warehouse (MDW). Earlier this year we wrote about the performance and scale benefits of this solution, and part of the pattern’s success has been our close integration to Azure SQL Data Warehouse with a high-performance connector that was jointly engineered to make it fast and easy to move data between the two services.

Something interesting about Synapse is that its implementation of Spark is not the same as the Databricks implementation (perhaps for licensing reasons). But that doesn’t stop us from using Databricks to process and curate data for Synapse Analytics.

Comments closed

Capturing Inserts and Updates in MERGE Statements

The Purple Frog folks show us how to collect the counts of insert and update operations when using MERGE statements:

This post hows how you can capture and store the number of records inserted, updated or deleted from a T-SQL Merge statement.

This is in response to a question on an earlier post about using Merge to load SCDs in a Data Warehouse.

You can achieve this by using the OUTPUT clause of a merge statement, including the $Action column that OUTPUT returns.

Read on for the answer. If only MERGE weren’t so riddled with problems.

Comments closed

The State of SQL Server Tools

Vicky Harp provides us a general update on things related to SQL Server tooling:

This week we’re announcing the general availability of SQL Server 2019, a significant milestone for Azure Data and for SQL Server customers. This presents a good moment to give an update on the state of tooling for SQL Server.

Since SQL Server 2016, the tools for SQL Server have been released independently “out of box” from the server product. This allows us to be more agile to the needs of our users, get both features and bug fixes shipped more quickly, stay aligned with the more continuous release cycle of Azure SQL, and in general allows the tools team to innovate in exciting ways. However, one side effect is that it can be difficult to understand what’s happening across the tools landscape, as things change quickly in multiple products that are releasing as frequently as every month.

Azure Data Studio has gotten a lot better. SSMS has gotten a little better. We’ve also seen work around several other tools, including command-line options. It’s a good time to be in SQL Server.

Comments closed