Press "Enter" to skip to content

Curated SQL Posts

Column-Level Encryption and Hashing

Eric Rouach shows off a pair of things:

Using as an example the AdventureWorks2014 database, the first script describes the process of encrypting the “CardNumber” column from the Sales.CreditCard table while keeping the data decryptable.

Our pre-requisite is the creation of a Master Key, a Certificate and a Symmetric Key.

Once having those created, we may proceed to the addition of a new column called “CardNumberEnc” (where the suffix “Enc” stands for “Encrypted”). This column has a VARBINARY(250) Data Type and is nullable.

Read on for an example of using column-level encryption, followed by how you’d decrypt the data. Then, Eric discusses hashing, though I disagree with the nomenclature of “encryption and make the data non-decryptable.” The reason is that encryption is, by its nature, a two-way process and necessarily requires the ability to decrypt. Hashing, meanwhile, is a one-way process without a direct means of reversal. Nomenclature aside, the examples are good and I appreciate Eric using one of the larger SHA2 hashing algorithms rather than MD5.

Comments closed

Categorizing Why Bugs Can Be Tricky

Julia Evans has a list:

Hello! I’m very slowly working on writing a zine about debugging, so I asked on Twitter the other day:

If you’ve run into a bug where it felt “impossible” to understand what was happening – what made it feel that way?

Of course, bugs always happen for logical reasons, but I’ve definitely run into bugs that felt like they might be impossible for me to understand (until I figured them out!)

I got about 400 responses, which I’ll try to summarize here. I’m not going to talk about how to deal with these various kinds of “impossible” bugs in this post, I’ll just try to classify them.

Click through for the major categories, as well as explanations and sub-categories. I think an interesting follow-up to this is to ask why we find ourselves in situations where we get these sorts of bugs and what (if anything) we can do to minimize or eliminate the likelihood of their appearance.

Comments closed

8 Ways to Solve a Problem in R

Holger von Jouanne-Diedrich shows how many ways there are to solve a problem of squares:

This time we want to solve the following simple task with R: Take the numbers 1 to 100, square them, and add all the even numbers while subtracting the odd ones!

If you want to see how to do that in at least seven different ways in R, read on!

There are many different solutions possible, making use of several aspects of the R language. So this blog post can be seen as a fun exercise to recap some of the concepts explained in our introduction to R: Learning R: The Ultimate Introduction (incl. Machine Learning!).

Give it a try and then check out the variety of solutions.

Comments closed

Pre-Sketching Data Visualizations

Laura Ellis explains the benefits of pre-sketching data visualizations:

When you take on a new data visualization project, it can be tempting to jump in and create visualizations right away with the idea that after enough exploring, the final format will present itself to you. And while it is important to dedicate time to EDA (exploratory data analysis), it can also be very beneficial to define a high-level plan early in the process.

Over time, I’ve found that producing an early sketch has been helpful in reducing the total amount of time and iterations taken towards building the end product.

Read on for the reasons why.

Comments closed

Connecting to Cosmos DB via Dedicated Gateway

Hasan Savran introduces us to the Cosmos DB Dedicated Gateway:

Cosmos DB team announced a new way named Dedicated Gateway to connect to Azure Cosmos DB. As you might know there is already a standard gateway to connect to Cosmos DB. Dedicated or Standard gateway means that there is a computer stays between Cosmos DB replica set and your application. Your application request goes to gateway server then goes to Cosmos DB database. The biggest difference between Standard Gateway and Dedicated Gateway is, you do not share the dedicated gateway server with other Cosmos DB customers.

     Dedicated Gateway is totally yours and you are responsible for its costs. Depending on your application size, you can select different size of gateway servers.

Read on to learn how expensive it is and the benefits it brings.

Comments closed

GROUP BY ROLLUP

Dinesh Asanka hits on one of the under-utilized grouping operators:

You will see that data is aggregated for the columns provided by the GROUP BY clause. Important to note that the data will not be ordered in the GROUP BY columns and you need to explicitly order them by using the ORDER BY clause as shown in the above query.

In the above query, if you wish to find the total for Australia only, you need to run another GROUP BY with EnglishCountryRegionName and perform a UNION ALL. This will be a very ugly method. By using GROUP BY ROLLUP you can achieve the above-said task as shown in the following query.

If I were to rank grouping operators by how frequently I use them, it’s GROUPING SETS by a country mile, then ROLLUP, and almost never do I use CUBE.

Comments closed

Using Ola’s Maintenance Solution on RDS

Jack Vamvas takes us through a couple of nuances around using Ola Hallengren’s SQL Server Maintenance Solution on Amazon RDS:

I’ve used the Ola Hallengren Maintenance Solution across various SQL Server environments . I was recently asked by a colleague about how adaptable they are to the AWS RDS SQL Server environment. 

I checked the Ola Hallengren FAQ and there is a comment :

Read on to learn the details.

Comments closed

From SQL Server to Excel via R

Kevin Wilkie wraps up a series on data movement between Excel and SQL Server:

In today’s post, we’ll go over how to export the data you have in SQL Server to Excel via one of my favorite computer languages – R. (Since we did have a post on how to Import data, it would seem rather rude not to have one on how to Export data.)

As always, you’ll need to open your R tool of choice. I tend to use RStudio but there are several out there that will accomplish this same goal.

Click through to see how.

Comments closed

Consistency and Completeness in Kafka Streams

Guozhang Wang announces a whitepaper:

Recently, however, some streaming engines, such as Apache Kafka® and its ecosystem component Kafka Streams, have been able to claim strong correctness guarantees, with the primary dual metrics being consistency, a guarantee that a stream processing application can recover from failures to a consistent state such that final results will not contain duplicates or lose any data, and completeness, a guarantee that a stream processing application does not generate incomplete partial outputs as final results even when input stream records may arrive out of order.

Click through for more details and a link to the paper itself. It’s good to understand as much as you can about the distributed system you use, especially because many times, the claims for consistency should come with large asterisks.

Comments closed