Press "Enter" to skip to content

Curated SQL Posts

A Starting Point for Data Protection

Deborah Melkin asks some questions:

If we start expanding things beyond just the technology and functionality, we can really see where the concept of data protection becomes much larger and more complex.

I admit that I’m not really up-to-speed on the technical aspects of encryption or data protection. That doesn’t fall under a lot of the work that I do. But there’s another side to data protection that’s worth talking about. It’s about knowing your data. This is where I’ve been spending a lot of my time these days.

When I ask if you know your data, I’m asking if you can answer the following questions:

Read on for some of the types of questions you’ll want to think about.

Comments closed

An Overview of Transparent Data Encryption

Chad Callihan looks at one option for securing a SQL Server instance:

This month’s T-SQL Tuesday topic comes from Matthew McGiffen, who asks us to talk about encryption and protecting data in SQL Server. To read the full topic invite, click the T-SQL Tuesday logo to the right.

For this month’s invite, I thought I’d write about Transparent Data Encryption (TDE) and give a reminder about how it relates to tempdb.

Read on for Chad’s reminder.

Comments closed

Bionic Reading in R

Tomaz Kastrun says reading is fundamental:

Trick your brain into faster reading with the help of Bionic Reading. With the help of highlighting part of the words, it “guides your eyes over the text and the brain remembers previously learned words more quickly.” (source: br-about)

Here is a beautiful example of how text with the use of opacity, colours, size and many other elements can be quickly achieved for faster reading.

Click through for an example and how to implement it in R.

Comments closed

A Path to Avoid Getting Overwhelmed with Microsoft Fabric

Kurt Buhler tries to limit information overload:

It’s just too much; I don’t have time for all this stuff.

I think this is a big problem. It’s a problem not just because people shouldn’t feel overwhelmed, but also because it says something about how effectively these new features, tools, and resources are being communicated, understood, and used. But what is the problem, exactly? And if you’re in the minority of people not feeling overwhelmed, why should you care?

Perhaps most importantly, how can we approach these new features, tools, and resources to ensure we understand them and can find value without feeling overwhelmed?

Read on for several tips on how to tackle learning about a product with a large surface area. And I’d also note that anybody who is comfortable working in SQL Server had to go through the same process.

Comments closed

The Internals of Backup Compression

Andy Yun continues a series on how backups work in SQL Server:

Welcome back to Part 4 of my Backup Internals series. Today, I’d like to spend a little time exploring backup compression.

When you take a regular FULL BACKUP, SQL Server is literally taking a byte-for-byte copy of your data files. Don’t believe me? Then go read this, then come back. Additionally, Microsoft architected BACKUP operations such that the resource utilization and impact would be minimal (when using default parameters).

This post taught me a few things about the practical impact of enabling compression. Even after reading this, however, I would almost always enable it for two reasons. First, storage is usually the bottleneck for organizations, so actions which reduce storage utilization can improve overall performance. Second, there are limits to how much we can store, so compressing backups may let me get away with holding more backups on a given LUN or drive.

Comments closed

An Overview of Encryption Options in SQL Server

Rob Farley has a cipher:

Encryption is a funny thing. Since the dawn of communication, whenever people have wanted to keep their secrets secret, they’ve used some sort of encryption. I’m sure parents started spelling things so their kids wouldn’t understand as soon as there was spelling. Using words their kids wouldn’t understand, while the kid sits there thinking “Oh, Dad, you’re so embarrassing, thinking I don’t know what that means…”. Encryption is all about keeping information away from people, particularly those who could use it against you. Ask the folk from Bletchley Park if you don’t realise how this can impact world events.

Rob links to Enigma and Bomba (the British system for decrypting Enigma messages) but there’s another interesting story out of Bletchley Park as well: the Lorenz cipher, which was cracked by a Polish mathematician early on, but decryption was quite slow, on the order of a message or so per day. This led to Colossus, the first digital computer in existence. The National Museum of Computing in Bletchley Park has a working rebuild of a Mark 2 Colossus on display and we got to see it (and get the story behind it) on day 1 of Data Relay this year, so that was fun to see.

As an interesting side note, the British never told the Soviets that they had decrypted the Lorenz cipher, so when the Soviets took hold of these machines near the end of World War II, they assumed that nobody had cracked the code, so they continued to use these for a while, allowing the British access to certain sensitive communications for a time.

Comments closed

Indexing in PostgreSQL

Henrietta Dombrovskaya continues a series on Postgres:

What is an index? One might assume that any person who works with databases knows what an index is. However, a surprising number of people, including database developers and report writers and, in some cases, even DBAs, use indexes, even create indexes, with only a vague understanding of what indexes are and how they are structured. Having this in mind, let’s start with defining what is an index.

Since there are many different index types in PostgreSQL (and new index types are constantly created) we won’t focus on structural properties to produce an index definition. Instead, we define an index based on its usage.

Indexing is one area in which SQL Server and Postgres differ, as SQL Server relies on clustered indexes for storage and “default” operations, whereas Postgres has a different model.

Comments closed

Conditional Formatting in Power BI with Field Parameters and Calculation Groups

Marco Russo and Alberto Ferrari perform some formatting:

If you want to build a report where the user can choose what measure to show, you have two features available in Power BI: field parameters and calculation groups. There are pros and cons to either technique – however, we are not about to talk about those. We narrow our scenario down to a specific requirement: we want to change the color of the value depending on the measure selected.

For example, suppose we let users choose between Sales AmountMargin, or Total Cost. In that case, we might provide visual feedback about the measure selected through different colors: black for Sales Amount, green for Margin, and red for Total Cost.

Click through for that example, though I will say that the color choices are hard to differentiate if you have protanopia and even more difficult if you have deuteranopia, so about 2% of the male population would struggle with interpreting this measure. People with protanomaly and deuteranomaly (about 6% of men) wouldn’t have too much difficulty with this particular color pairing.

Comments closed

Common SQL Server Mistakes: Default Auto-Growth

Hemantgiri Goswami takes a look at auto-growth:

Auto Growth is a feature that allows database files (primary, secondary, and log) to expand when the database file becomes full – without manual intervention.

Auto Growth feature is handy when we do not want to increase the size of database files manually. There are two ways you can set auto growth – using SQL Server Management Studio (SSMS hereafter) and T-SQL. Auto Growth can be configured – In Percent and Megabytes.

Auto-growth isn’t a problem on its own, though growth sizes, especially in older versions of SQL Server, were far too low for medium- and large-sized databases.

I don’t particularly like the 2.5 MB example Hemantgiri shows. I have a quick rule of thumb which is 64MB for small databases, 256-512 for medium-sized databases, and 1GB for large databases (assuming my underlying disk is fast). This limits the number of auto-growth events and, for log files in particular, keeps virtual log file counts more reasonable.

Comments closed

Package Management in Python

Georgia Atkinson wraps things up with a bow:

Python is a general purpose, high level language which, thanks to its simplicity and versatility, has become very popular, especially within the data science community. The extensive Python community has developed and contributed thousands of libraries and packages over the years in a plethora of different disciplines to aid developers with their applications. Managing these packages can be a challenging task without the correct tools. That’s where Python package managers come in. In this blog post we will explore what a package manager is and why they are important. We will then cover some popular examples, including how to use them, how to install them and the pros and cons of each.

Whilst we will briefly touch on virtual environments in places, we will explore these in more depth in an upcoming post.

Read on for a primer on three options, including how they compare to one another for CI/CD purposes.

Comments closed