Press "Enter" to skip to content

Author: Kevin Feasel

Enabling Always Encrypted in SQL Server

Scott Klein takes us through the process of enabling Always Encrypted:

Always Encrypted provides a separation between those who own the data and can view it, and those who manage the data but should have no access. Always Encrypted does this by allowing clients to encrypt sensitive data inside client applications while never revealing the encryption key to the database engine. 

One of our clients utilizes Always Encrypted to protect sensitive data. In our work for this client, we realized that a walkthrough documenting the end-to-end process of enabling Always Encrypted would be helpful for everyone.  

Click through for a walkthrough of the process.

Comments closed

Tidying the Confusion Matrix in R

Gary Hutson has a new package for us:

The package aim is to make it easier to convert the outputs of the lists from caret and collapse these down into row-by-row entries, specifically designed for storing the outputs in a database or row by row data frame.

This is something that the CARET library does not have as a default and I have designed this to allow the confusion matrix outputs to be stored in a data frame or database, as many a time we want to track the ML outputs and fits over time to monitor feature slippage and changes in the underlying patterns of the data.

I like the way caret shows the confusion matrix when I’m reviewing result on my own, but I definitely appreciate efforts to make it easier to handle within code—similar to how broom reads linear regression outputs. H/T R-bloggers

Comments closed

The Joy of Parameter Sniffing

Erik Darling points out that parameter sniffing is generally a good thing:

To some degree, I get it. You’re afraid of incurring some new performance problem.

You’ve had the same mediocre performance for years, and you don’t wanna make something worse.

The thing is, you could be making things a lot better most of the time.

We should specifically talk about parameter sniffing problems rather than parameter sniffing as a problem. These sorts of problems are closer to the exception than the rule.

Comments closed

Fixing Those Pesky Wait Stats

David Alcock keeps us from having to think:

CXPACKET

Attempt to pronounce parallellellellellism correctly then set MAXDOP to 1.

PAGELATCH

Mention TempDB and contention in the same sentence. Delete TempDB

BACKUP_

Delete any long running backup jobs. If wait persists then delete all backup jobs.

Click through for plenty of excellent nuggets of advice which definitely won’t land you on the unemployment line.

Comments closed

Fun with Multi-Column Unique Constraints

Aaron Bertrand has an interesting use case:

A problem that comes up occasionally with constraints in SQL Server is when a unique constraint applies to multiple columns, but the values in those columns can be populated in any order. For example, if a table holds interactions between two users, and has columns User1 and User2, there should only be one row allowed for users 100 and 200, regardless of whether the data is entered as 100, 200 or 200, 100.

Click through for one solution. Another solution would be to normalize this down further with a dbo.ConversationParticipants table.

Comments closed

Disorderly Queries

Chad Callihan wants you to think about that ORDER BY clause:

I recently came across a scenario where an application process was not performing correctly on one database but was working fine on others. The process should have been completing in seconds but was taking minutes with no indication of activity. After some investigation, I found that the process was stuck waiting on a SELECT statement to complete. Even worse, it was holding an exclusive lock on a table which was then blocking new information from processing.

One part of the SELECT query that stood out was that it was ordering by a date field. Considering what the process was doing, there was no need to have the data ordered. Generally, it’s better to have the data sorted on the application side instead of SQL Server but in this case not even that was necessary.

There are definitely good cases where you need to use ORDER BY in a database—such as paging scenarios. But if you don’t need things in a particular order, Chad shows that you can potentially save a good deal on performance without an explicit ordering.

Comments closed

Choosing an Image File Type

The folks at Jumping Rivers continue a series on image optimization:

As the JPEG compression algorithm significantly reduces file size, JPEG files are ubiquitous across the web. If you take a photo on your camera, it’s almost certainly using a JPEG storage format. Historically the file extension was .jpg as Microsoft Windows only handled three character file extensions (also .htm vs .html). But today both extensions are used (personally I prefer .jpeg, but I’m not very consistent if I’m totally honest).

If you did a little Googling on which file format to use for images, then the answer you would come across is that JPEG’s are the default choice. But remember, figures are different from standard images!

Click through for a review of three viable image formats.

Comments closed

Research with R and Production with Python

Matt Dancho and Jarrell Chalmers lay out an argument:

The decision can be challenging because they both Python and R have clear strengths.

R is exceptional for Research – Making visualizations, telling the story, producing reports, and making MVP apps with Shiny. From concept (idea) to execution (code), R users tend to be able to accomplish these tasks 3X to 5X faster than Python users, making them very productive for research.

Python is exceptional for Production ML – Integrating machine learning models into production systems where your IT infrastructure relies on automation tools like Airflow or Luigi.

They make a pretty solid argument. I’ve launched success R-based projects using SQL Server Machine Learning Services, but outside of ML Services, my team’s much more likely to deploy APIs in Python, and we’re split between Dash and Shiny for visualization. H/T R-Bloggers

Comments closed

Protecting Excel with Powershell

Mikey Bronowski shows us a few techniques for protecting data in Excel files using Powershell:

Last month we have been hiding things in Excel, so this week we are going to make sure they are protected as well. Excel offers multiple levels of password protection and its options:

– locking file with a password, i.e. without key phrase opening file is not possible
– protecting workbook’s structure
– lastly, protecting individual worksheets from a handful of operations

Read on to see each of those in action.

Comments closed