Press "Enter" to skip to content

Month: October 2019

Improving Post-Join Aggregation Performance in Power Query

Imke Feldmann finds some nice performance improvements with aggregating data after a join using Power Query:

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That’s useful if multiple rows are returned for the rows of the table that has been joined to (left table):

But this method is extremely slow. Compared to “simply” expanding all values to new rows (which took around 5 seconds), the aggregation took around 50 seconds. The automatically generated code uses the “Table.AggregateTableColumn”-function.

Read on to see two separate attempts to speed things up.

Comments closed

Scripting Out Linked Servers with Actual Passwords

Ajay Dwivedi shows how you can script out a linked server creation statement which includes actual passwords:

For moving Logins/Users, Microsoft provided revlogin script which made it easy for migration of logins without need to know about passwords. But, there is no easy approach for migration LinkedServers with the actual password. This is where dbatools cmdlet Copy-DbaLinkedServer becomes very handy. But, what about the situation where we have to script out LinkedServer beforehand.

For this reason, based on the blog post of Antti Rantasaari, and using his code as the base script, I have created a cmdlet Get-LinkedServer with SQLDBATools module which accepts SqlInstance name as a parameter along with -ScriptOut switch, and gives Drop/Create statements for linked servers present on that local/remote SqlInstance.

As a quick note, SQLDBATools is not the same as dbatools.

Comments closed

Incremental Data Migration to Blob Storage

Ginger Daniel has started a series on data migration into Azure Blob Storage:

Part 1 of this article demonstrates how to upload multiple tables from an on-premise SQL Server to an Azure Blob Storage account as csv files.  I covered these basic steps to get data from one place to the other using Azure Data Factory, however there are many other alternative ways to accomplish this, and many details in these steps that were not covered.  For a deep-dive into the details you can start here https://docs.microsoft.com/en-us/azure/data-factory/introduction, and https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal#create-a-pipeline

Part 1 was chock full of information, and it looks like Part 2 will be as well.

Comments closed

Analyzing DAX DISTINCTCOUNT Performance

Marco Russo takes a look at how fast DISTINCTCOUNT is in DAX:

This article shows how to implement the same DISTINCTCOUNT calculation in two alternative ways, measuring and comparing the performance in different reports. You will see that while DISTINCTCOUNT can be implemented using SUMX / DISTINCT, the DISTINCTCOUNT version is usually better. That is, unless the density of the reports is high and the calculation apply filters to the measures that do not correspond to the grouping granularity of the visualization – as is always the case using time intelligence functions. There are cases where SUMX / DISTINCT can offer better performance, but you have to clarify whether optimizing one report might slow down many others. Measuring performance using DAX Studio is the only way to know what to expect for your model and reports.

Read on for the full test.

Comments closed

Azure AD Credential Passthrough and Databricks

Anna Shrestinian, et al, explain how Azure Databricks enables Azure Active Directory credential passthrough when working with Azure Data Lake Storage Gen2:

Azure Data Lake Storage (ADLS) Gen2, which became generally available earlier this year, is quickly becoming the standard for data storage in Azure for analytics consumption. ADLS Gen2 enables a hierarchical file system that extends Azure Blob Storage capabilities and provides enhanced manageability, security and performance.

The hierarchical file system provides granular access control to ADLS Gen2. Role-based access control (RBAC) could be used to grant role assignments to top-level resources and POSIX compliant access control lists  (ACLs) allow for finer grain permissions at the folder and file level. These features allow users to securely access their data within Azure Databricks using the Azure Blob File System (ABFS) driver, which is built into the Databricks Runtime.

There are some tradeoffs involved, particularly around using High Concurrency clusters (or limiting yourself to one user account), but it’s a nice bit of added value when you’re a heavy Azure user.

Comments closed

Re-Introducing rquery

John Mount has a new introduction to rquery:

rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in Codd’s relational algebra, with the addition of the traditional SQL “window functions.” More on the background and context of rquery can be found here.

The R/rquery version of this introduction is here, and the Python/data_algebra version of this introduction is here.

Check it out.

Comments closed

What’s New with Standard Edition

Niko Neugebauer is jazzed about SQL Server 2019 Standard Edition:

The documentation has been released on the Editions & Features support in Sql Server 2019 and there are huge news that will get a good number of Standard Editions users excited.

The thing that should make every single user of the Sql Server 2019 Standard Edition jump is the presence of TDE (Transparent Database Encryption), the feature that

– every single responsible company would love to be able to use to secure their data
– that every single DBA, Developer & IT Professional kept asking Microsoft to include for so many years
– was already there since Azure SQL Database has it by default a couple of years even for the basic edition

I am beyond happy with this news. Grateful that Microsoft has listened to the common sense voice and made a security feature available for the paying customers, giving them a little bit more possibility to be conformant to such demanding norms as GDPR.

Aside from that, Niko looks at several new features which will be available in Standard Edition, as well as a few Enterprise-only features.

Comments closed

Emergency Mode in SQL Server

Paul Randal answers a reader question:

I had a blog comment question a few days ago that asked why emergency-mode repair requires the database to be in EMERGENCY mode as well as SINGLE_USER mode.

All repair operations that DBCC CHECKDB (and related commands) performs require the database to be in single-user mode so there’s a guarantee that nothing can be changing while the checks and repairs are done. But that doesn’t change the behavior of what repair does – that needs emergency mode too.

Read on for an explanation of what emergency mode is and why we need it to run CHECKDB repair operations.

Comments closed

skip-2.0 and SQL Server Security

K. Brian Kelley has the lowdown on skip-2.0:

Problem
I’ve read recently that there’s a new piece of malware that’s been named skip-2.0 and it targets SQL Server. What exactly is it, where did it come from, and how do I protect myself against it?

Solution
This new piece of malware, skip-2.0, does target SQL Server. Specifically, it targets SQL Server versions 11 and 12, which correspond to the names SQL Server 2012 and SQL Server 2014 respectively. Therefore, if you’re only running SQL Server 2016 or higher, you’re not affected by skip-2.0 (yet another reason to upgrade).

Read on to learn how it works and how you can protect against it.

Comments closed