Press "Enter" to skip to content

Day: December 26, 2019

Data Visualization in R and Python

Michelle Golchert contrasts libraries for visualizing data in R and Python:

Unlike R, Python – as a “general-purpose” programming language – does not include data visualization tools by default. However, Python also provides many libraries for this purpose, such as Matplotlib and Seaborn.

Python now also offers numerous packages (like plotnine and ggpy) which are equivalents of ggplot2 in R, and allow you to create plots in Python according to the same “Grammar of Graphics” principle.

This is an area where I think R has the upper hand at most levels: it’s easier to get started plotting with R (thanks to the built-in plots), it’s easier to do “intermediate-quality” plots (stuff you would use in an internal presentation), and you tend to have more control when building professional-quality plots. You can certainly create beautiful visuals in both languages, though.

Comments closed

Parsing ADF ARM Templates with T-SQL

Paul Andrew shows how you can use T-SQL to read an Azure Data Factory ARM template:

While documenting a customers data platform solution I decided it would be far easier if we could summarise the contents of a fairly complex Data Factory using its ARM Template. So, this is what I’ve done using T-SQL to parse the ARM Template JSON and output of series of tables containing details about the factory components.

That is quite the clever solution.

Comments closed

Wrapping Up Azure Data Factory

Cathrine Wilhelmsen wraps up a long series on Azure Data Factory with three final posts. First is lookups:

Lookups are similar to copy data activities, except that you only get data from lookups. They have a source dataset, but they do not have a sink dataset. (So, like… half a copy data activity? :D) Instead of copying data into a destination, you use lookups to get configuration values that you use in later activities.

And how you use the configuration values in later activities depends on whether you choose to get the first row only or all rows.

From there, it’s the bottom line question:

Congratulations! You’ve made it through my entire Beginner’s Guide to Azure Data Factory 🤓 We’ve gone through the fundamentals in the first 23 posts, and now we just have one more thing to talk about: Pricing.

And today, I’m actually going to talk! You see, in November 2019, I presented a 20-minute session at Microsoft Ignite about understanding Azure Data Factory pricing. And since it was recorded and the recording is available for free for everyone… Well, let’s just say that after 23 posts, I think we could both appreciate a short break from reading and writing

In case you missed anything, Cathrine has a summary and shows where you can learn a lot more:

After this, I will be taking a break from creating new content. However, I will continue to edit, update, tweak, rewrite, and improve all 25 posts already published. I originally published one post per day as an Azure Data Factory Advent Calendar, and even while writing I noticed things that I didn’t have time to cover or things that I wanted to go back and improve. But! I needed to get all the posts published first. I consider this the first edition of the series. Now, the editing begins. Then, I will do my best to keep the content updated as Azure Data Factory keeps evolving

This was a huge series; kudos to Cathrine for putting it all together.

Comments closed

SQL Server 2017 and Column-Level Encryption

Steve Jones notes a change between SQL Server 2016 and SQL Server 2017 around column-level encryption:

I discovered recently that there was a change made in SQL Server 2017 to the way that symmetric key passphrases are hashed. There’s a KB article that notes the fix, but basically the passphrases used to be encrypted with SHA1. That’s cryptographically insecure, so the algorithm was updated to SHA2.

This is a problem, and can cause some issues. I’ll show the issue and then how to get around it.

There’s not much detail in the KB article about what happens afterward: if you need to keep the trace flag on forever or if it eventually migrates everything over to using SHA2 for hashes.

Comments closed

Computed Columns and Temporal Tables

Randolph West shows us how to include computed columns in temporal tables:

As I say in my Back to the future with Temporal Tables session, there are several limitations we should be aware of with this feature. One I had not previously considered is that computed columns are not permitted in a history table. This is a problem if we are converting an existing auditing process to make use of temporal tables, and we have computed columns in our table.

Computed columns are columns that are created from an expression based on one or more existing columns in a table. They are useful if we need to index a portion of a wider column, for example.

Hat tip to Erik Darling for coming up with the solution.

Comments closed

Calculating the Pain of UDFs

Taiob Ali points out something added to SQL Server 2017 (and later 2016 and 2014):

Microsoft SQL Server Management Studio (SSMS) version 17.5 added new showplan attributes UdfCpuTime and UdfElapsedTime to QueryTimeStats. These two attributes will measure the time and CPU spent on user-defined functions within a query execution hence helping to discover the impact of UDF execution within full query execution. This feature was first added in SQL Server 2017 CU3 and was backported to SQL Server 2016 SP2. Finding the execution time and CPU for UDF was always a challenge for Data professionals because the number of times a function will execute will vary.

This was a blind spot for a very long time.

Comments closed