Press "Enter" to skip to content

Curated SQL Posts

Reviewing The 2017 Data Breach Investigations Report

Jen Underwood picks out some interesting tidbits from the Verizon 2017 Data Breach Investigations Report:

Each year Verizon, in conjunction with the VERIS Community Database initiative, releases the annual data breach investigations report. This year’s report is based on analysis of 42,068 security incidents, including 1,935 confirmed data breaches. Within this free report, readers are provided incident analysis universally and by industry, detailed insights, and tips to mitigate cyber security threats. For data professionals, the data breach report is one of those “must at least skim” resources to understand the changing nature of threats that you are most likely to face to help you prepare and prevent them.

Click through for Jen’s summary, and I recommend you check out the report as well.

Comments closed

TensorFlow Tutorial

Ashish Bakshi has a TensorFlow tutorial:

As shown in the image above, tensors are just multidimensional arrays, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set. In fact, the name “TensorFlow” has been derived from the operations which neural networks perform on tensors. It’s literally a flow of tensors. Since, you have understood what are tensors, let us move ahead in this TensorFlow tutorial and understand – what is TensorFlow?

The sample here is Python, though there is an R library as well.

Comments closed

Backup-Related Instance Settings

Monica Rathbun explains a few instance-level backup properties:

Default backup media retention in days. Now the first things that comes to my mind is that “hey this is a cleanup job” SCORE! Thinking that maybe this will auto delete old backups. After all isn’t that what retention means? NOPE, not in this case.

In this case it’s just a number of days before that a backup media can be OVERWRITTEN. If the DBA goes to overwrite the media before those days it will give a warning message. You’ll note in every back up action you do the RETAINDAYS option is filled in. In this case it will always reflect to 90 now that we have changed it. In general, this a pointless option to me. I don’t normally OVERWRITE backup media. To me this was more relevant when Tapes were used and disk were harder to come by, so I leave it alone.

Read on for more settings.

Comments closed

Interleaved Execution And Compatibility Levels

Arun Sirpal gives us some helpful information regarding interleaved execution in SQL Server 2017:

I have read-only T-SQL that references the MSTVF. I did have some code that use both data modifications and cross apply but interleaved execution does not occur in those scenarios.

So on my SQL Server 2017 instance I set the database to 110 compatibility mode and set query store on where then I execute my code.

Note that 110 is the compatibility mode for SQL Server 2012.  That becomes an important part of Arun’s story.

Comments closed

Columnstore Memory Pressure And Bulk Loading

Niko Neugebauer shows what happens when you try to bulk load data into a columnstore index but don’t have enough memory available:

After waiting for the 25 seconds (notice the difference between the request_time and grant_time is exactly 25 seconds), the engine decides to grant some minimum amount of memory anyway, allowing the process to carry on, without being cancelled, but the penalisation is very heavy – the inserts will not go into the compressed row groups, but into the Delta-Stores, making this operation not-minimally-logged and in other words, painfully slow and inefficient.
To confirm the final results, let’s check on the Row Groups of our tables, given that we have canceled the inserts into the 2 first tables, we expect 1 row group for the [dbo].[FactOnlineSales_Stage3] table and 1 row group for the [dbo].[FactOnlineSales_Stage4] table, corresponding to the 3rd and 4th threads of data loading:

As Niko points out, this could be the difference between a well-behaved, single compressed rowgroup load versus dumping a million rows into the deltastore.

Comments closed

SQL Server Single-User Mode In Linux

Anthony Nocentino shows us how to get into single-user mode in SQL Server on Linux:

There was a question this morning on the SQL Server Community Slack channel from SvenLowry about how to launch SQL Server on Linux in Single User Mode. Well you’ve heard everyone say, it’s just SQL Server…and that’s certainly true and this is another example of that idea.

The command line parameters from the sqlservr binary are passed through into the SQLPAL managed Win32 SQL Process. So let’s check out how to do this together…

Click through for a demo.

Comments closed

When “Invalid Column Name” Isn’t A Permissions Issue

Kenneth Fisher shares a story of hunting down the cause of an error message:

This time we had a vendor reporting the following error:

Msg 207, Level 16, State 1, Line 7
Invalid column name ‘Name’

Now the vendor was certain this was a permissions issue. It worked fine on their systems, it worked fine on some of ours. So why didn’t it always work? Well, the easy answer is permissions! Particularly since we had denied them db_owner just recently.

So why do I sound so dismissive about permissions as a possibility? I mean it COULD be permissions. It certainly is possible. But first of all, we don’t use column level permissions very often (no one uses them all that often from what I can tell) and secondly it worked on several other systems where they had exactly the same permissions as this system.

Ok, so what is the problem? You guessed it! (I really have to stop asking for guesses after I’ve put the answer in the title.)

I went out of my way not to give the answer here, so you’ll have to look at Kenneth’s title.  And then read the whole thing.

Comments closed

Fooling Neural Networks

Rodrigo Agundez shows how to fool neural networks:

A comprehensive and complete summary can be found in the When DNNs go wrong blog, which I recommend you to read.

All these amazing studies use state of the art deep learning techniques, which makes them (in my opinion) difficult to reproduce and to answer questions we might have as non-experts in this subject.

My intention in this blog is to bring the main concepts down to earth, to an easily reproducible setting where they are clear and actually visible. In addition, I hope this short blog can provide a better understanding of the limitations of discriminative models in general. The complete code used in this blog post can be found here.

This is a great article.

Comments closed

Grouping Data With ROLLUP

Steve Jones shows how easy it is to use the ROLLUP function:

I was editing an article recently that talked about ROLLUP, and I wanted to play with this a bit more. I hadn’t used this much in my career, but it’s a handy operator that’s worth including in your toolkit.

ROLLUP is used to provide additional totals for your aggregates while using GROUP BY. Here’s a little example. I’ve got some data for American Football quaterbacks. In this case, I’ve extracted some stats for a few noteworthy players today.

I’ll probably end up using ROLLUP about once every three months and be very pleased when I remember that it exists.  I use GROUPING SETS more often and almost never use CUBE.

Comments closed