Press "Enter" to skip to content

Curated SQL Posts

Keeping Database Role Information In Source Control

Louis Davidson has a post on handling database security in source control:

Yeah, things get messy, no matter what model you choose for securing your PROD data:

  1. Create one user and give it all rights to the database
  2. Create specific users and give them the least amount of rights to do what is must, and no more
  3. Somewhere in between the previous 2

Truly, #3 is generally the answer. Let’s say that you give the application all the rights that any user of the system can have, and let the application dole out the rights to individuals. This is not a terrible plan, but I dare say that many databases contain data, or utilities that it is not desirable to give to the users. (My utility schema generally has tools to maintain and release code, something that you don’t want general users to have access to. And lest you have a developer working “with” you like I once did, you don’t want the application to have access to the tools to disable all of the constraints in the database, even if you have ETL uses for that code.)

Check it out for some examples.

Comments closed

RDP Error: CredSSP Encryption Oracle Remediation

Kerry Tyler explains an error message popping up in RDP sessions:

In March, a vulnerability in CredSSP (Credential Security Support Provider) was patched, which would affect authentication via RDP (this is outlined in advisory CVE-2018-0886).  However, it was implemented in such a way that the behavior change didn’t have to be “honored” by either the server or the client involved in an RDP session.

The intent was that this would be controlled by GPO in enterprise environments, and a new GPO setting to activate or deactivate this behavior was released at the same time.

GPO settings have a default value, which they will use when nothing has been explicitly set for a particular setting. In this case, the GPO has three possible values: Force Updated Clients (for servers to only take connections from patched clients), Mitigated (for both, and on a workstation means that it won’t fall back to old/insecure behavior when attaching to unpatched servers), and Vulnerable (for both, and means what it sounds like–anything goes!).

In March, the default behavior was set to “Vulnerable”, which means everything kept working for everyone. But in the May security rollup, the default setting for that GPO was flipped to “Mitigated” if there was not an explicit setting for it…

If you get this error, the best thing is to patch the machines involved, but Kerry shows the workaround you can use if you need to use RDP in the meantime to connect to an unpatched machine.

Comments closed

Date And Time Functions To Avoid

Randolph West shares his thoughts on three functions he’d rather you avoid:

CURRENT_TIMESTAMP is the ANSI-equivalent of GETDATE(). ANSI is an acronym for the American National Standards Institute, and sometimes vendors will include ANSI functions in their products so they can say that they’re ANSI-compliant (which is not a bad thing, in most cases).

There are three main problems with CURRENT_TIMESTAMP:

  • No brackets. It goes against the rules about functions. So much for standards!
  • It’s functionally equivalent to GETDATE(), which uses DATETIME, which we previously identified is old and bad.
  • It’s too similar to the poorly-named TIMESTAMP data type, which has nothing to do with dates and times and should be called ROWVERSION.

Bottom line: don’t use CURRENT_TIMESTAMP.

At one point I used CURRENT_TIMESTAMP over GETDATE() with the thought of portability in mind.  Since then, my thoughts on code portability have changed and regardless, as Randolph mentions, it’s better to use DATETIME2 functions to avoid precision issues with DATETIME.

Comments closed

OpenSSH Now Built Into Windows

Anthony Nocentino is excited about the latest release of Windows 10:

Today is a big day! The OpenSSH client version 7.6p1 is now part of the Windows 10 operating system! Microsoft released Windows 10 Update 1803 and included in that release is the OpenSSH client, which is installed as part of the update.

That’s right an SSH client as part of the Windows operating system by default! Also included with this update is the OpenSSH Server which is included as an Windows Feature on Demand.

Let’s take a look at what this is all made of!

I’m still going to use PuTTY for my SSH needs, but it’s nice to see that there’s a default option if you’re in a pinch and working on an unfamiliar server.

Comments closed

Converting Between Time Series Classes In R

Christoph Sax announces a new R library:

tsbox, now freshly on CRAN, provides a set of tools that are agnostic towards existing time series classes. It is built around a set of converters, which convert time series stored as tsxtsdata.framedata.tabletibblezootsibble or timeSeries to each other.

If you have to work with time series data, this will be a useful library.  H/T R-Bloggers

Comments closed

Using AU Analyzer To Lower Data Lake Analytics Costs

Matthew Hicks shows off the Data Lake Analytics AU Analyzer:

The AU Analyzer looks at all the vertices (or nodes) in your job, analyzes how long they ran and their dependencies, then models how long the job might run if a certain number of vertices could run at the same time. Each vertex may have to wait for input or for its spot in line to run. The AU Analyzer isn’t 100% accurate, but it provides general guidance to help you choose the right number of AUs for your job.

You’ll notice that there are diminishing returns when assigning more AUs, mainly because of input dependencies and the running times of the vertices themselves. So, a job with 10,000 total vertices likely won’t be able to use 10,000 AUs at once, since some will have to wait for input or for dependent vertices to complete.

In the graph below, here’s what the modeler might produce, when considering the different options. Notice that when the job is assigned 1427 AUs, assigning more won’t reduce the running time. 1427 is the “peak” number of AUs that can be assigned.

I like this kind of tooling, as it provides a realistic assessment of tradeoffs.

Comments closed

What’s New In Hadoop 3.1?

Wangda Tan, et al, look at some of the new features in Apache Hadoop 3.1:

The diagram below captures the building blocks together at a high level. If you have to tie this back to a fictitious self-flying drone company, the company will collect tons of raw images from the test drones’ built-in cameras for computer vision. Those images can be stored in the Apache Hadoop data lake in a cost-effective (with erasure coding) yet highly available manner (multiple standby namenodes). Instead of providing GPU machines to each of the data scientists, GPU cards are pooled across the cluster for access by multiple data scientists. GPU cards in each server can be isolated for sharing between multiple users.

Support of Docker containerized workloads means that data scientists/data engineers can bring the deep learning frameworks to the Apache Hadoop data lake and there is no need to have a separate compute/GPU cluster. GPU pooling allows the application of the deep learning neural network algorithms and the training of the data-intensive models using the data collected in the data lake at a speed almost 100x faster than regular CPUs.

If the customer wants to pool the FPGA (field programmable gate array) resources instead of GPUs, this is also possible in Apache Hadoop 3.1. Additionally, use of affinity and anti-affinity labels allows us to control how we deploy the microservices in the clusters — some of the components can be set to have anti-affinity so that they are always deployed in separate physical servers.

It’s interesting to see Hadoop evolve over time as the ecosystem solves more real-time problems instead of focusing on giant batch problems.

Comments closed

Using mssql-cli

Prashanth Jayaram shows how to install and use the mssql-cli client:

Switching to the editor mode is pretty simple and straight-forward. At the bottom of the screen, we can see the help bar which guides us through the switching process between the available editor modes. The options available for instant switching are the multiline mode, activated by pressing F3, and the Emacs mode, activated by pressing the F4 button.

To run the multi-line query in the multi-line mode, append the query with a semicolon and then press the enter key to execute it.

Use the same keys as mentioned above to turn on and turn off the editor modes—F3 for the multi-line query mode and F4 for the EMACS mode.

If you’re big on command-line interfaces, you’ll probably enjoy this client.

Comments closed

Gaps And Islands With DAX

Philip Seamark answers one of the classic gaps and islands problems with DAX:

A recent post on the Power BI community website asked if it was possible to compress a group of numbers into text that described the sequential ranges contained within the numbers. This might be a group of values such as 1, 2, 3, 4, 7, 8, 9, 12, 13:  (note there are gaps) with the expected result grouping the numbers that run in a sequence together to produce text like “1-4, 7-9, 12-13”.  Essentially to identify gaps when creating the text.  This seemed like an interesting challenge and here is how I solved it using DAX.

Read on for the solution, which is conceptually very similar to the T-SQL solution but a bit different in implementation.

Comments closed