Press "Enter" to skip to content

Curated SQL Posts

Azure VM Encryption

Melissa Coates looks at different encryption methods available for Azure Virtual Machines:

Initially I opted for Storage Service Encryption due to its sheer simplicity. This is done by enabling encryption when you initially provision the storage account. After having set it up, I had proceeded onto other configuration items, one of which is setting up backups via the Azure Recovery Vault. Turns out that encrypted backups in the Recovery Vault are not (yet?) supported for VMs encrypted with Storage Service Encryption (as of Feb 2017).

Next I decided to investigate Disk Encryption because it supports encrypted backups in the Recovery Vault. It’s more complex to set up because you need a Service Principal in AAD, as well as Azure Key Vault integration. (More details on that in my next post.)

Click through for a point-by-point comparison between the two methods.

Comments closed

Replication Error When Listing Directory Contents

Andrew Peterson troubleshoots a replication issue:

You’re trying to setup SQL Server Replication on a server, and it fails. Looking thru the error message you find this:

        An exception occurred while executing a Transact-SQL statement or batch.
        (Microsoft.SQLServer.ConnectionInfo)

        Destination path ………….is not valid. Unable to list directory contents. Specify a valid
            destination path.
            Changed database context to ‘master’. (Microsoft SQL Server, Error: 14430)

Read on for the solution.

Comments closed

Thoughts On Deprecated Technologies

Dan Guzman discusses a couple deprecated components which are still hanging around:

The message is loud and clear that ODBC is the supported and preferred path for native applications going forward. The Data Access Technologies Road Map provides an overview and history of Microsoft data access technologies, which I recommend you peruse to ensure you are not inadvertently using deprecated or unsupported technologies for new development and, for existing applications, consider moving from legacy data access technologies to current ones when practical.

The current Microsoft ODBC Driver for SQL Server as of this writing is ODBC Driver 13 for SQL Server. Note that that both the 13.0 and 13.1 versions of this driver have the same “ODBC Driver 13 for SQL Server” display name listed under installed programs and ODBC Data Source Administrator. If installed, the driver will be listed under installed programs along with the corresponding driver version (when viewed detail mode). The 13.1 version adds support for the Always Encrypted feature. These ODBC Drivers are available from the link I mentioned earlier.

They will pry OLE DB from my cold, dead hands.  For my money, it’s still the best SQL Server data access technology for Integration Services, beating out ODBC both in terms of ease of use and performance.

Comments closed

Encryption In ElasticMapReduce

Sai Sriparasa shows how to enable encryption in an ElasticMapReduce cluster:

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR

  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption

  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Turns out this is rather straightforward.

Comments closed

Data Frame Serialization In R

David Smith shows a new contender for serializing data frames in R, fst:

And now there’s a new package to add to the list: the fst package. Like the data.table package (the fast data.frame replacement for R), the primary focus of the fst package is speed. The chart below compares the speed of reading and writing data to/from CSV files (with fwrite/fread), feather, fts, and the native R RDS format. The vertical axis is throughput in megabytes per second — more is better. As you can see, fst outperforms the other options for both reading (orange) and writing (green).

These early numbers look great, so this is a project worth keeping an eye on.

Comments closed

JSON In Clustered Columnstore Indexes

Jovan Popovic gives a use case for JSON data being part of a clustered columnstore index:

This is equivalent to collections that you might find in classic NoSQL database because they store each JSON document as a single entity and optionally create indexes on these documents. The only difference is CLUSTERED COLUMNSTORE index on this table that provides the following benefits:

  1. Data compression – CCI uses various techniques to analyze your data and choose optimal compression algorithms to compress data.

  2. Batch mode analytic – queries executed on CCI process rows in the batches from 100 to 900 rows, which might be much faster than row-mode execution.

I think it’s worth reading this in conjunction with Niko Neugebauer’s comments regarding strings in columnstore.

Comments closed

Compatibility Level And Forced Plans

Erin Stellato has an experiment with Query Store plan forcing:

A question came up recently about plan guides and compatibility mode, and it got me thinking about forced plans in Query Store and compatibility mode.  Imagine you upgraded to SQL Server 2016 and kept the compatibility mode for your database at 110 to use the legacy Cardinality Estimator.  At some point, you have a plan that you force for a specific query, and that works great.  As time goes on, you do testing with the new CE and eventually are ready to make the switch to compatibility mode 130.  When you do that, does the forced plan continue to use compatibility mode 110?  I had a guess at the answer but thought it was worth testing.

There are some interesting results here.

Comments closed

Thinking About The Gitlab Outage

Brent Ozar shares his thoughts on the recent Gitlab outage:

You can read more about the details in GitLab’s outage timeline doc, which they heroically shared while they worked on the outage. Oh, and they streamed the whole thing live on YouTube with over 5,000 viewers.

There are so many amazing lessons to learn from this outage: transparency, accountability, processes, checklists, you name it. I’m not sure that you, dear reader, can actually put a lot of those lessons to use, though. After all, your company probably isn’t going to let you live stream your outages. (I do pledge to you that I’m gonna do my damnedest to do that ourselves with our own services, though.)

There are some good pointers in here.

Comments closed

Understanding DBCC PAGE

David Alcock looks at the undocumented DBCC PAGE command:

This is an error that has been picked up on one of my test systems and indicates that SQL Server has detected a torn page, that is a page that has been incorrectly written by SQL Server and possibly indicates a problem in the IO subsystem.

The problem here is that whilst we know the database and the page where the error has occurred we don’t know the specific table the page belongs and importantly what type of page is in error. The reason why the page type is important is because this will drastically impact our recovery process but the first thing we will do is check a system table to see if any other page errors have been reported:

DBCC PAGE is just about the most-documented undocumented command around.  Worth a read.

Comments closed