Curated SQL – Page 1337 – A Fine Slice Of SQL Server

Steps

Stop jobs that process CDC (SSIS).
Inside a transaction with isolation level serializable: Alter Table schema and create temporary CDC table
Copy old CDC rows to new table excluding dup rows (based on [__$seqval])
Disable old (original) CDC table (schema is outdated). Will drop table

Click through for the rest of the steps and an example script.

Comments closed

Issues With TDE And Backup Compression

Published 2018-05-18 by Kevin Feasel

Ned Otter describes the troubled history of the union of Transparent Data Encryption and backup compression:

The history of TDE and backup compression is that until SQL 2016, they were great features that didn’t play well together – if TDE was in play, backup compression didn’t work well, or at all.

However, with the release of SQL 2016, Microsoft aimed to have these two awesome features get along better (the blog post announcing this feature interoperability is here). Then there was this “you need to patch” post, due to edge cases that might cause your backup to not be restored. So if you haven’t patched in a while, now would be a good time to do so, because Microsoft says those issues have been resolved (although that seems to be disputed here).

My sympathies definitely lie toward backup compression over TDE if forced to choose between the two.

Comments closed

Tips For Managing Business Logic

Published 2018-05-18 by Kevin Feasel

Tim Mitchell has a few tips for managing business logic, particularly when building ETL processes:

First things first: let’s get on the same page with what is meant by business logic. When I refer to business logic (also commonly referred to as business rules), I’m talking about the processing rules that are used to transform an organization’s data so that it is accurate, understandable, and usable. In almost all cases, these business rules are not designed to change the meaning of the data, but to clarify and make it easier to comprehend. Business logic may be applied when data arrives from other sources, or to existing data to reflect changes that have taken place.

Business logic is usually highly customized for and by each organization. The amount of processing required is heavily dependent on factors such as source data quality, reporting granularity, technical skill level of the intended audience, and even company culture.

It’s worth reading the whole thing.

Comments closed

Statistical Power And The False Discovery Rate

Published 2018-05-17 by Kevin Feasel

Brad Klingbenberg has an insightful article on false discovery rate:

A good frequentist would never interpret a p-value as the probability that the null hypothesis is true. But it can be enormously tempting. And despite all your efforts to the contrary it is likely that many of your colleagues don’t appreciate the distinction.

So, really, how wrong is it to treat a p-value as (one minus) the posterior probability that the null hypothesis is true? In general, it’s bad. But in some cases a p-value is a very good approximation to a posterior probability. Here we examine that approximation in a common testing scenario.

Check it out for sure.

Comments closed

Tuning xgboost Models In R

Published 2018-05-17 by Kevin Feasel

Gabriel Vasconcelos has a new series on tuning xgboost models:

My favourite Boosting package is the xgboost, which will be used in all examples below. Before going to the data let’s talk about some of the parameters I believe to be the most important. These parameters mostly are used to control how much the model may fit to the data. We would like to have a fit that captures the structure of the data but only the real structure. In other words, we do not want the model to fit noise because this will be translated in a poor out-of-sample performance.

eta: Learning (or shrinkage) parameter. It controls how much information from a new tree will be used in the Boosting. This parameter must be bigger than 0 and limited to 1. If it is close to zero we will use only a small piece of information from each new tree. If we set eta to 1 we will use all information from the new tree. Big values of eta result in a faster convergence and more over-fitting problems. Small values may need to many trees to converge.
colsample_bylevel: Just like Random Forests, some times it is good to look only at a few variables to grow each new node in a tree. If we look at all variables the algorithm needs less trees to converge, but looking at, for example, $2/3$ of the variables may result in models more robust to over-fitting. There is a similar parameter called colsample_bytree that re-sample the variables in each new tree instead of each new node.

Read the whole thing. H/T R-bloggers

Comments closed

Reverting After Impersonation

Published 2018-05-17 by Kevin Feasel

Kenneth Fisher shows a couple common issues when executing as another user or login:

The problem is that sometimes I’ll forget to revert back.

1

EXECUTE AS LOGIN = 'Doc';

Msg 15406, Level 16, State 1, Line 8
Cannot execute as the server principal because the principal “Doc” does not exist, this type of principal cannot be impersonated, or you do not have permission.

Hey! I know I Doc exists, and I’m sysadmin. How is this not working? Simple enough, I forgot to REVERT. There are two simple options here. I either completely forgot to run it, or I was in the wrong database.

Read the whole thing.

Comments closed

Tips For Debugging Large Procedures

Published 2018-05-17 by Kevin Feasel

Erik Darling has a few hints for debugging large stored procedures in SQL Server:

Tip #1: Format Your Code

There’s no shortage of free and paid tools out there. This list from the Recently Legendary Aaron Bertrand on Stack Exchange has both.

This one alone is great. Erik has several other tips as well.

Comments closed

Keeping Database Role Information In Source Control

Published 2018-05-17 by Kevin Feasel

Louis Davidson has a post on handling database security in source control:

Yeah, things get messy, no matter what model you choose for securing your PROD data:

Create one user and give it all rights to the database

Create specific users and give them the least amount of rights to do what is must, and no more

Somewhere in between the previous 2

Truly, #3 is generally the answer. Let’s say that you give the application all the rights that any user of the system can have, and let the application dole out the rights to individuals. This is not a terrible plan, but I dare say that many databases contain data, or utilities that it is not desirable to give to the users. (My utility schema generally has tools to maintain and release code, something that you don’t want general users to have access to. And lest you have a developer working “with” you like I once did, you don’t want the application to have access to the tools to disable all of the constraints in the database, even if you have ETL uses for that code.)

Check it out for some examples.

Comments closed

RDP Error: CredSSP Encryption Oracle Remediation

Published 2018-05-17 by Kevin Feasel

Kerry Tyler explains an error message popping up in RDP sessions:

In March, a vulnerability in CredSSP (Credential Security Support Provider) was patched, which would affect authentication via RDP (this is outlined in advisory CVE-2018-0886). However, it was implemented in such a way that the behavior change didn’t have to be “honored” by either the server or the client involved in an RDP session.

The intent was that this would be controlled by GPO in enterprise environments, and a new GPO setting to activate or deactivate this behavior was released at the same time.

GPO settings have a default value, which they will use when nothing has been explicitly set for a particular setting. In this case, the GPO has three possible values: Force Updated Clients (for servers to only take connections from patched clients), Mitigated (for both, and on a workstation means that it won’t fall back to old/insecure behavior when attaching to unpatched servers), and Vulnerable (for both, and means what it sounds like–anything goes!).

In March, the default behavior was set to “Vulnerable”, which means everything kept working for everyone. But in the May security rollup, the default setting for that GPO was flipped to “Mitigated” if there was not an explicit setting for it…

If you get this error, the best thing is to patch the machines involved, but Kerry shows the workaround you can use if you need to use RDP in the meantime to connect to an unpatched machine.

Comments closed

Date And Time Functions To Avoid

Published 2018-05-17 by Kevin Feasel

Randolph West shares his thoughts on three functions he’d rather you avoid:

CURRENT_TIMESTAMP is the ANSI-equivalent of GETDATE(). ANSI is an acronym for the American National Standards Institute, and sometimes vendors will include ANSI functions in their products so they can say that they’re ANSI-compliant (which is not a bad thing, in most cases).

There are three main problems with CURRENT_TIMESTAMP:

No brackets. It goes against the rules about functions. So much for standards!

It’s functionally equivalent to GETDATE(), which uses DATETIME, which we previously identified is old and bad.

It’s too similar to the poorly-named TIMESTAMP data type, which has nothing to do with dates and times and should be called ROWVERSION.

Bottom line: don’t use CURRENT_TIMESTAMP.

At one point I used CURRENT_TIMESTAMP over GETDATE() with the thought of portability in mind. Since then, my thoughts on code portability have changed and regardless, as Randolph mentions, it’s better to use DATETIME2 functions to avoid precision issues with DATETIME.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Curated SQL Posts

Tip #1: Format Your Code