Press "Enter" to skip to content

Day: May 17, 2018

Statistical Power And The False Discovery Rate

Brad Klingbenberg has an insightful article on false discovery rate:

A good frequentist would never interpret a p-value as the probability that the null hypothesis is true. But it can be enormously tempting. And despite all your efforts to the contrary it is likely that many of your colleagues don’t appreciate the distinction.

So, really, how wrong is it to treat a p-value as (one minus) the posterior probability that the null hypothesis is true? In general, it’s bad. But in some cases a p-value is a very good approximation to a posterior probability. Here we examine that approximation in a common testing scenario.

Check it out for sure.

Comments closed

Tuning xgboost Models In R

Gabriel Vasconcelos has a new series on tuning xgboost models:

My favourite Boosting package is the xgboost, which will be used in all examples below. Before going to the data let’s talk about some of the parameters I believe to be the most important. These parameters mostly are used to control how much the model may fit to the data. We would like to have a fit that captures the structure of the data but only the real structure. In other words, we do not want the model to fit noise because this will be translated in a poor out-of-sample performance.

  • eta: Learning (or shrinkage) parameter. It controls how much information from a new tree will be used in the Boosting. This parameter must be bigger than 0 and limited to 1. If it is close to zero we will use only a small piece of information from each new tree. If we set eta to 1 we will use all information from the new tree. Big values of eta result in a faster convergence and more over-fitting problems. Small values may need to many trees to converge.

  • colsample_bylevel: Just like Random Forests, some times it is good to look only at a few variables to grow each new node in a tree. If we look at all variables the algorithm needs less trees to converge, but looking at, for example, 2/3 of the variables may result in models more robust to over-fitting. There is a similar parameter called colsample_bytree that re-sample the variables in each new tree instead of each new node.

Read the whole thing.  H/T R-bloggers

Comments closed

Reverting After Impersonation

Kenneth Fisher shows a couple common issues when executing as another user or login:

The problem is that sometimes I’ll forget to revert back.

1
EXECUTE AS LOGIN = 'Doc';

Msg 15406, Level 16, State 1, Line 8
Cannot execute as the server principal because the principal “Doc” does not exist, this type of principal cannot be impersonated, or you do not have permission.

Hey! I know I Doc exists, and I’m sysadmin. How is this not working? Simple enough, I forgot to REVERT. There are two simple options here. I either completely forgot to run it, or I was in the wrong database.

Read the whole thing.

Comments closed

Keeping Database Role Information In Source Control

Louis Davidson has a post on handling database security in source control:

Yeah, things get messy, no matter what model you choose for securing your PROD data:

  1. Create one user and give it all rights to the database
  2. Create specific users and give them the least amount of rights to do what is must, and no more
  3. Somewhere in between the previous 2

Truly, #3 is generally the answer. Let’s say that you give the application all the rights that any user of the system can have, and let the application dole out the rights to individuals. This is not a terrible plan, but I dare say that many databases contain data, or utilities that it is not desirable to give to the users. (My utility schema generally has tools to maintain and release code, something that you don’t want general users to have access to. And lest you have a developer working “with” you like I once did, you don’t want the application to have access to the tools to disable all of the constraints in the database, even if you have ETL uses for that code.)

Check it out for some examples.

Comments closed

RDP Error: CredSSP Encryption Oracle Remediation

Kerry Tyler explains an error message popping up in RDP sessions:

In March, a vulnerability in CredSSP (Credential Security Support Provider) was patched, which would affect authentication via RDP (this is outlined in advisory CVE-2018-0886).  However, it was implemented in such a way that the behavior change didn’t have to be “honored” by either the server or the client involved in an RDP session.

The intent was that this would be controlled by GPO in enterprise environments, and a new GPO setting to activate or deactivate this behavior was released at the same time.

GPO settings have a default value, which they will use when nothing has been explicitly set for a particular setting. In this case, the GPO has three possible values: Force Updated Clients (for servers to only take connections from patched clients), Mitigated (for both, and on a workstation means that it won’t fall back to old/insecure behavior when attaching to unpatched servers), and Vulnerable (for both, and means what it sounds like–anything goes!).

In March, the default behavior was set to “Vulnerable”, which means everything kept working for everyone. But in the May security rollup, the default setting for that GPO was flipped to “Mitigated” if there was not an explicit setting for it…

If you get this error, the best thing is to patch the machines involved, but Kerry shows the workaround you can use if you need to use RDP in the meantime to connect to an unpatched machine.

Comments closed

Date And Time Functions To Avoid

Randolph West shares his thoughts on three functions he’d rather you avoid:

CURRENT_TIMESTAMP is the ANSI-equivalent of GETDATE(). ANSI is an acronym for the American National Standards Institute, and sometimes vendors will include ANSI functions in their products so they can say that they’re ANSI-compliant (which is not a bad thing, in most cases).

There are three main problems with CURRENT_TIMESTAMP:

  • No brackets. It goes against the rules about functions. So much for standards!
  • It’s functionally equivalent to GETDATE(), which uses DATETIME, which we previously identified is old and bad.
  • It’s too similar to the poorly-named TIMESTAMP data type, which has nothing to do with dates and times and should be called ROWVERSION.

Bottom line: don’t use CURRENT_TIMESTAMP.

At one point I used CURRENT_TIMESTAMP over GETDATE() with the thought of portability in mind.  Since then, my thoughts on code portability have changed and regardless, as Randolph mentions, it’s better to use DATETIME2 functions to avoid precision issues with DATETIME.

Comments closed

OpenSSH Now Built Into Windows

Anthony Nocentino is excited about the latest release of Windows 10:

Today is a big day! The OpenSSH client version 7.6p1 is now part of the Windows 10 operating system! Microsoft released Windows 10 Update 1803 and included in that release is the OpenSSH client, which is installed as part of the update.

That’s right an SSH client as part of the Windows operating system by default! Also included with this update is the OpenSSH Server which is included as an Windows Feature on Demand.

Let’s take a look at what this is all made of!

I’m still going to use PuTTY for my SSH needs, but it’s nice to see that there’s a default option if you’re in a pinch and working on an unfamiliar server.

Comments closed