Learning Versus Remembering

Via R-Bloggers, a discussion on learning versus remembering with respect to data science:

If you’re like most aspiring data scientists, you’ll try to learn this code by using the copy-and-paste method. You’ll take this code from a blog post like this, copy it into RStudio and run it.

Most aspiring data scientists do the exact same thing with online courses. They’ll watch a few videos, open the course’s sample code, and then copy-and-paste the code.

Watching videos, reading books, and copy-and-pasting code do help you learn, at least a little. If you watch a video about ggplot2, you’ll probably learn how it works pretty quickly. And if you copy-and-paste some ggplot2 code, you’ll probably learn a little bit about how the code works.

Here’s the problem: if you learn code like this, you’ll probably forget it within a day or two.

This is a thought-provoking article that applies to all disciplines, not just data science.

Updating Multiple Statistics Concurrently

SQL Scotsman explains trace flag 7471, which allows you to update multiple statistics on a table concurrently:

Running multiple UPDATE STATISTICS commands for different statistics on a single table concurrently has been available under global Trace Flag 7471 since SQL Server 2014 SP1 CU6 and SQL Server 2016 CU1.  Microsoft have documented this trace flag here and here.

It sounds like, for the most part, you might not want this flag turned on, but read the whole post.

Building Runbooks

Monica Rathbun explains the concept of runbooks:

Don’t try to build job security into what you do. I know many that worry about giving up the knowledge to others. Having the sole “how to” knowledge for some, gives them a sense of job security. While to a point that might be true, it also locks you in to your current position. Many that hoard their knowledge never advance because they find themselves invaluable in their current position. “We can’t move them because they are the only ones who know about such and such”. Why put yourself in that position? If you can’t ever be replaced, you also can’t move up.

As a lone dba, I find this run book to be vital. It allows me to direct someone to the book and I can walk them through running anything I need them to in my absence.  It allows me to take a vacation or a day off while giving others the tools to get things done.

Exactly.  It’s easy to get caught in the trap that your value is in the specific details of some process that you know, and so the company can’t get rid of you because you’re the only person who knows this.  One of the counter-intuitive results of IT culture is that reputation comes from sharing information rather than hoarding it.

Getting Finer-Grained Security In Spark

Vadim Vaks explains how to get finer-grained permissions within Spark using Ranger and LLAP:

With LLAP enabled, Spark reads from HDFS go directly through LLAP. Besides conferring all of the aforementioned benefits on Spark, LLAP is also a natural place to enforce fine grain security policies. The only other capability required is a centralized authorization system. This need is met by Apache Ranger. Apache Ranger provides centralized authorization and audit services for many components that run on Yarn or rely on data from HDFS. Ranger allows authoring of security policies for: – HDFS – Yarn – Hive (Spark with LLAP) – HBase – Kafka – Storm – Solr – Atlas – Knox Each of the above services integrate with Ranger via a plugin that pulls the latest security policies, caches them, and then applies them at run time.

Read on for more details.

Azure Management Using R

Kevin Feasel


Cloud, R

Alan Weaver introduces AzureSMR:

The AzureSMR functions currently addresses the following Azure Services:

  • Azure Blob: List, Read and Write to Blob Services

  • Azure Resources: List, Create and Delete Azure Resource. Deploy ARM templates.

  • Azure VM: List, Start and Stop Azure VMs

  • Azure HDI: List and Scale Azure HDInsight Clusters

  • Azure Hive: Run Hive queries against a HDInsight Cluster

  • Azure Spark: List and create Spark jobs/Sessions against a HDInsight Cluster(Livy)

This can be useful for cases like when you need to ramp up the Spark cluster before running a particularly compute-intensive process.

Linear Gauge Custom Visual

Devin Knight shows off the linear gauge custom visual in Power BI:

In this module you will learn how to use the Linear Gauge Power BI Custom Visual.  The Linear Gauge would often be used to visualize a KPI. It gives you the ability to compare an actual vs target as well as showing up to two trend lines.

This can be a very useful visual.  The tricky part is that the bars aren’t scaled the same, so when your eyes want to compare bar lengths, it can get a little confusing.

Windows 10 IoT Code To Back Up Databases

Drew Furgiuele writes code to back up your databases using a Raspberry Pi 3 and Windows 10 IoT edition:

The trickiest part of wiring a circuit like this is detecting a button press. Most logic boards don’t know if an input circuit should poll at high or low levels. That’s where pull-ups come in. Above, you can see we set one of the pins for the button to be a pull-up (or an input if we were using another board). That means it will pull the current and look for impedance. The other important thing is our debounce. With circuits, one button press can actually turn into lots because as soon as the switch completes (or interrupts) the circuit, it starts sending signals. A debounce is like a referee saying “only look for a signal for this long” and it will filter out extra “presses” based on current that might linger on a press.

Once we detect our button press, we’re calling the function below. All it does is read the current LED pin values, and looks to see which one is currently lit, and then lights the next one.

Go from understanding general purpose input/output pins to calling SMO via a web service all in one post.  If you’ve got an itch for a weekend project, have at it.

Understanding Data Gateways

James Serra walks us through the different data gateways available in Azure:

On-premises data gateway: Formally called the enterprise version.  Multiple users can share and reuse a gateway in this mode.  This gateway can be used by Power BI, PowerApps, Microsoft Flow or Azure Logic Apps.  For Power BI, this includes support for both scheduled refresh and DirectQuery.  To add a data source such as SQL Server that can be used by the gateway, check out Manage your data source – SQL Server.  To connect the gateway to your Power BI, you will sign in to Power BI after you install it (see On-premises data gateway in-depth).

Click through for more details on additional gateways.

Pivoting Data

Jana Sattainathan explains the PIVOT operator:

The results are so much easier to look at and comprehend, aren’t they? All object types for a schema are on a single line and it is easy for us to do impact analysis visually.

Sometimes doing it in T-SQL is the best approach, but pivoting is generally something which is cheaper in the application tier, whether you’re building a report, dashboard, or web app.

Checking Last CHECKDB Date Using DBCC PAGE

Kevin Feasel



Wayne Sheffield shows how to get the last time DBCC CHECKDB ran on each database:

The “trick” to making this work is to encapsulate the DBCC command as a string, and to call it with the EXECUTE () function. This is used as part of an INSERT INTO / EXECUTE statement, so that the results from DBCC PAGE are inserted into a table (in this case a temporary table is used, although a table variable or permanent table can also be used). There are three simple steps to this process:

  1. Create a table (permanent / temporary) or table variable to hold the output.

  2. Insert into this table the results of the DBCC PAGE statement by using INSERT INTO / EXECUTE.

  3. Select the data that you are looking for from the table.

Read on for his code as well as important caveats.


December 2016
« Nov Jan »