Press "Enter" to skip to content

Author: Kevin Feasel

Columnstore Versus Page Compression

Aaron Bertrand compares columnstore and page compression on a specific table:

Recently someone at work asked for more space to accommodate a rapidly growing table. At the time it had 3.75 billion rows, presented on 143 million pages, and occupying ~1.14TB. Of course we can always throw more disk at a table, but I wanted to see if we could scale this more efficiently than the current linear trend. Sounds like a great job for compression, right? But I also wanted to try out some other solutions, including columnstore – which people are surprisingly reluctant to try. I am no Niko, but I wanted to make an effort to see what it could do for us here.

Note that I am not focusing on reporting workload or other read query performance at this time – I merely want to see what impact I can have on storage (and memory) footprint of this data.

Here is the original table. I’ve changed table and column names to protect the innocent, but everything else is relatively accurate.

Page compression won, and I’ve got a pretty good idea why (though some of the diagnostic info is gone): Aaron has several VARCHAR and NVARCHAR columns, and those blow up the columnstore dictionary pretty fast. Aaron has more to go in this series, so stay tuned.

Comments closed

Auditing the Power BI Activity Log

Brett Powell shows how to query the Power BI Activity Log:

The new Power BI Activity Log makes it much easier for Power BI administrators to access audit log data to monitor and analyze activities occurring within the tenant they support. This blog post describes one solution for accessing, processing, and loading Power BI activity log data as part of an automated process to support ongoing Power BI administrative analysis and reporting.

Brett has a very nicely defined solution and lays it out step by step for us.

Comments closed

Costs and Benefits of Recompilation

Erik Darling takes us through the pros and cons of slapping that RECOMPILE label on a query:

It’s been a while since SQL Server has had a real RECOMPILE problem. And if you put it up against the performance problems that you can hit with parameter sniffing, I’d have a hard time telling someone strapped for time and knowledge that it’s the worst idea for them.

Obviously, you can run into problems if you (“you” includes Entity Framework, AKA the Database Demolisher) author the kind of queries that take a very long time to compile. But as I list them out, I’m kinda shrugging.

My rule of thumb is that WITH RECOMPILE isn’t the first answer, but I won’t mess around too long before going to it.

Comments closed

Auditing Index Changes

Jason Brimhall gives us a solution to track index changes:

In the previous article, I showed a quick and dirty method to try and capture the indexes that may have been just created. We all know that in the hectic life of the DBA, it is easy enough to forget the indexes created more than just a few hours in the past.

In that article, I mentioned that the solution provided was far from a complete solution to the entire problem. A bigger part of the picture involves audits to know when and by whom indexes were created, altered or dropped.

Keeping track of this kind of thing is important, particularly in an environment where performance suddenly changes even though the devs totally didn’t touch anything, promise.

Comments closed

Monitoring Oracle on Azure

Kellyn Pot’vin-Gorman covers several tools which are available for working with Oracle databases in Azure:

The Oracle SQL Developer product has come a long way since it’s inception and much of that credit needs to go to the incredible team at Oracle, including those that are prevelant in the Oracle community, Like Jeff Smith, Kris Rice and Ashley Chen.  Their willingness to listen to the Oracle community and turn their needs into features has been one of the critical reasons for the product success.

Although this product is more focused towards the developer, unlike the previous three, I want to point out a few areas that hopefully will convince you there are more similarities than differences.

The shortest version of this is “the same tools as exist on-prem” but if you don’t know that answer, Kellyn’s got you covered.

Comments closed

Recovering From Parsing NaN Values

Slava Murygin shows how you can recover from a nasty TRY_PARSE error:

In my previous post I outlined very dangerous SQL Server problem, caused by usually not very harmful commands “PARSE” and “TRY_PARSE“: Having “NaN” value for REAL and FLOAT producing severe error.

This post will be about a major issue it causes and on how to fight it.

Will start from generating the problem.

Attention: Do not run that in production!!!

Slava’s serious about not running the code in a real environment.

Comments closed

Parsing ADF ARM Templates with T-SQL

Paul Andrew shows how you can use T-SQL to read an Azure Data Factory ARM template:

While documenting a customers data platform solution I decided it would be far easier if we could summarise the contents of a fairly complex Data Factory using its ARM Template. So, this is what I’ve done using T-SQL to parse the ARM Template JSON and output of series of tables containing details about the factory components.

That is quite the clever solution.

Comments closed

Data Visualization in R and Python

Michelle Golchert contrasts libraries for visualizing data in R and Python:

Unlike R, Python – as a “general-purpose” programming language – does not include data visualization tools by default. However, Python also provides many libraries for this purpose, such as Matplotlib and Seaborn.

Python now also offers numerous packages (like plotnine and ggpy) which are equivalents of ggplot2 in R, and allow you to create plots in Python according to the same “Grammar of Graphics” principle.

This is an area where I think R has the upper hand at most levels: it’s easier to get started plotting with R (thanks to the built-in plots), it’s easier to do “intermediate-quality” plots (stuff you would use in an internal presentation), and you tend to have more control when building professional-quality plots. You can certainly create beautiful visuals in both languages, though.

Comments closed

SQL Server 2017 and Column-Level Encryption

Steve Jones notes a change between SQL Server 2016 and SQL Server 2017 around column-level encryption:

I discovered recently that there was a change made in SQL Server 2017 to the way that symmetric key passphrases are hashed. There’s a KB article that notes the fix, but basically the passphrases used to be encrypted with SHA1. That’s cryptographically insecure, so the algorithm was updated to SHA2.

This is a problem, and can cause some issues. I’ll show the issue and then how to get around it.

There’s not much detail in the KB article about what happens afterward: if you need to keep the trace flag on forever or if it eventually migrates everything over to using SHA2 for hashes.

Comments closed

Wrapping Up Azure Data Factory

Cathrine Wilhelmsen wraps up a long series on Azure Data Factory with three final posts. First is lookups:

Lookups are similar to copy data activities, except that you only get data from lookups. They have a source dataset, but they do not have a sink dataset. (So, like… half a copy data activity? :D) Instead of copying data into a destination, you use lookups to get configuration values that you use in later activities.

And how you use the configuration values in later activities depends on whether you choose to get the first row only or all rows.

From there, it’s the bottom line question:

Congratulations! You’ve made it through my entire Beginner’s Guide to Azure Data Factory 🤓 We’ve gone through the fundamentals in the first 23 posts, and now we just have one more thing to talk about: Pricing.

And today, I’m actually going to talk! You see, in November 2019, I presented a 20-minute session at Microsoft Ignite about understanding Azure Data Factory pricing. And since it was recorded and the recording is available for free for everyone… Well, let’s just say that after 23 posts, I think we could both appreciate a short break from reading and writing

In case you missed anything, Cathrine has a summary and shows where you can learn a lot more:

After this, I will be taking a break from creating new content. However, I will continue to edit, update, tweak, rewrite, and improve all 25 posts already published. I originally published one post per day as an Azure Data Factory Advent Calendar, and even while writing I noticed things that I didn’t have time to cover or things that I wanted to go back and improve. But! I needed to get all the posts published first. I consider this the first edition of the series. Now, the editing begins. Then, I will do my best to keep the content updated as Azure Data Factory keeps evolving

This was a huge series; kudos to Cathrine for putting it all together.

Comments closed