Press "Enter" to skip to content

Month: December 2016

Optimistic Locking Via HTTP ETags

Kevin Sookocheff diagrams how to implement optimistic concurrency for a server which uses HTTP requests to handle resources like files:

A conditional request is a request that may be executed differently depending on the value of specific HTTP headers. These headers define the precondition that must be true before the server should execute the request. With respect to entity tags, we have two options for making requests conditional.

  1. If-Match: The request will succeed if the ETag of the remote resource is equal to the one listed in this header.
  2. If-None-Match: The request will succeed if the ETag of the remote resource is different to each listed in this header.

By specifying the appropriate ETag and condition header, you can perform optimistic locking for concurrent operations on a resource. Let’s walk through an example of how this works in practice.

Read on for more details.

Comments closed

Linux Data Science Virtual Machine

David Smith mentions the Linux data science virtual machine on Azure:

The Linux Data Science Virtual Machine includes all of the tools a modern data scientist needs, in one easy-to-launch package. With it, you can try exploring data with Apache Drill, train deep neural networks for computer vision with MXNet, develop AI applications with the Cognitive Toolkit, or create statistical models with big data in R with Microsoft R Server 9.0.

They also offer a free trial, so check it out.

Comments closed

Blocked Process Report

Kendra Little has a few Github gists showing how to configure the blocked process report:

I wanted a friendly way to share code to configure and manage the Blocked Process Report, so I’ve created a gist on GitHub sharing TSQL that:

  • Enables the Blocked Process Report (BPR)

  • Collects the BPR with an Extended Events trace

  • Collects the BPR using a Server Side SQL Trace (in case you don’t care XEvents or are running an older version of SQL Server)

  • Lists out the Extended Events and SQL Traces you have running, and gives you code to stop and delete traces if you wish

Click through for the code.

Comments closed

REPLACENULL

Louis Davidson shows a quick SSIS function to replace NULL values:

Which I looked up every..single..time I used it. “?” means THEN…not IF? “:” means ELSE? Huh?  I know this comes from one of those cool languages that I have never mastered, but as I was searching for the syntax again a few days ago, I found REPLACENULL. I had never seen this function before, so I figured I might not be the only one. And perhaps if a commenter feels like telling me how dumb I am to not know about other new expression features I will not be offended. REPLACENULL won’t replace every use of the these and the other symbols one must use for SSIS expressions, it does replace one of the more common ones.

Click through for usage.  It’s a bit easier to understand than the ternary operator.  To answer Louis’s question, a ? b : c comes from C# syntax.

Comments closed

When Is NOLOCK Okay?

Erik Darling wants to figure out the acceptable boundaries for NOLOCK:

Picture a couple developers who started their app in the cloud, where they can’t get fancy with tempdb, fast disks aren’t in the budget yet, along with that beefier server with some extra RAM. They may not be able to turn on RCSI or SI at the drop of a hat; tempdb would keel over with the row versioning as part of a workload that already uses it pretty heavily.

They still need to run reports, either for users, or for higher ups at the company, and they can ask for them at any time. Caching the data when user activity is low and reporting against it when someone asks may raise some questions, like “why doesn’t my sales data show anything from today?”, or worse. You could invalidate the cache every X minutes, but that doesn’t help because then you need to re-run that reporting query every X minutes. That’s only moderately better than letting users query it at will.

Figuring out when read uncommitted is acceptable is a business decision.  As much as I dislike using NOLOCK, as long as the people on the business side understand the risks, it’s their call.

Comments closed

Loading SMO In Powershell

Chrissy LeMaire shows how to load SMO with full names when you don’t know which version is installed:

In a recent version of PowerShell, Publish-Module, which publishes modules to the Gallery began requiring fully qualified Assembly names such as “Microsoft.SqlServer.Smo, Version=$smoversion, Culture=neutral, PublicKeyToken=89845dcd8080cc91”.

Previously, it was sufficient just to use short names such as Microsoft.SqlServer.Smo. This had similar behavior to LoadWithPartialName.

I get that LoadWithPartialName is sketchy, but the solution that Chrissy has to do seems overly complicated to me.  Nonetheless, those are the rules of the game now, I suppose.

Comments closed

Learning Versus Remembering

Via R-Bloggers, a discussion on learning versus remembering with respect to data science:

If you’re like most aspiring data scientists, you’ll try to learn this code by using the copy-and-paste method. You’ll take this code from a blog post like this, copy it into RStudio and run it.

Most aspiring data scientists do the exact same thing with online courses. They’ll watch a few videos, open the course’s sample code, and then copy-and-paste the code.

Watching videos, reading books, and copy-and-pasting code do help you learn, at least a little. If you watch a video about ggplot2, you’ll probably learn how it works pretty quickly. And if you copy-and-paste some ggplot2 code, you’ll probably learn a little bit about how the code works.

Here’s the problem: if you learn code like this, you’ll probably forget it within a day or two.

This is a thought-provoking article that applies to all disciplines, not just data science.

Comments closed

Updating Multiple Statistics Concurrently

SQL Scotsman explains trace flag 7471, which allows you to update multiple statistics on a table concurrently:

Running multiple UPDATE STATISTICS commands for different statistics on a single table concurrently has been available under global Trace Flag 7471 since SQL Server 2014 SP1 CU6 and SQL Server 2016 CU1.  Microsoft have documented this trace flag here and here.

It sounds like, for the most part, you might not want this flag turned on, but read the whole post.

Comments closed

Building Runbooks

Monica Rathbun explains the concept of runbooks:

Don’t try to build job security into what you do. I know many that worry about giving up the knowledge to others. Having the sole “how to” knowledge for some, gives them a sense of job security. While to a point that might be true, it also locks you in to your current position. Many that hoard their knowledge never advance because they find themselves invaluable in their current position. “We can’t move them because they are the only ones who know about such and such”. Why put yourself in that position? If you can’t ever be replaced, you also can’t move up.

As a lone dba, I find this run book to be vital. It allows me to direct someone to the book and I can walk them through running anything I need them to in my absence.  It allows me to take a vacation or a day off while giving others the tools to get things done.

Exactly.  It’s easy to get caught in the trap that your value is in the specific details of some process that you know, and so the company can’t get rid of you because you’re the only person who knows this.  One of the counter-intuitive results of IT culture is that reputation comes from sharing information rather than hoarding it.

Comments closed

Getting Finer-Grained Security In Spark

Vadim Vaks explains how to get finer-grained permissions within Spark using Ranger and LLAP:

With LLAP enabled, Spark reads from HDFS go directly through LLAP. Besides conferring all of the aforementioned benefits on Spark, LLAP is also a natural place to enforce fine grain security policies. The only other capability required is a centralized authorization system. This need is met by Apache Ranger. Apache Ranger provides centralized authorization and audit services for many components that run on Yarn or rely on data from HDFS. Ranger allows authoring of security policies for: – HDFS – Yarn – Hive (Spark with LLAP) – HBase – Kafka – Storm – Solr – Atlas – Knox Each of the above services integrate with Ranger via a plugin that pulls the latest security policies, caches them, and then applies them at run time.

Read on for more details.

Comments closed