wrapr 1.5.0 Now On CRAN

Kevin Feasel



John Mount announces wrapr 1.5.0:

wrapr includes a lot of tools for writing better R code:

John also includes an example using the coalesce operator %?%.

Using The Azure Data Science VM With GPUs

Jennifer Marsman has some tips and tricks around using the Azure Data Science Virtual Machine on an instance running with GPU support:

To get GPU support, you need both hardware with GPUs in a datacenter, as well as the right software – namely, a virtual machine image that includes GPU drivers so you can use the GPU.

The biggest tip is to use the Deep Learning Virtual Machine!  The provisioning experience has been optimized to filter to the options that support GPU (the NC series – see below), which make it easier to set it up correctly.

Read on for the rest of the advice.

Running Hive LLAP As A YARN Service

Kevin Feasel



Gour Saha, et al, demonstrate running Apache Hive LLAP as a YARN service:

Making LLAP as a first-class YARN service also enables us to use some of the other powerful features in YARN that were added in Apache Hadoop 3.0 / 3.1, some of them are noted below.

  1. Advanced container placement scheduling such as affinity and anti-affinity. What Slider used to handle in a custom way is now a core first-class feature (YARN-6592).

  2. Rich APIs for users to fetch/query application details using Timeline Service V2 (YARN-2928 and YARN-5355).

  3. New and improved Services UI in YARN UI2 improving debuggability and log access.

  4. Continuous rolling log aggregation of long running containers (YARN-2443).

  5. Auto-restart of containers by NodeManagers (YARN-4725).

  6. Windowing and threshold based container health monitor (YARN-8122).

  7. In the future, we can also leverage YARN level rolling upgrades for containers and the service as a whole (YARN-7512 and YARN-4726).

Looks like it’s been a fruitful transition.

Preventing Server Manager From Loading

Steve Stedman shows how to prevent the Server Manager app from loading whenever you RDP into a Windows Server machine:

If you frequently connect to many different SQL Server as I do, you are probably used to the Server Manager loading slowly when you log in with Remote Desktop.

The Server Manager has a bad reputation for taking up lots of CPU over time and possibly even bogging down a SQL Server when left open for days on end.

To prevent this from automatically loading you can do the following to quickly disable it for your user session, and your future user sessions.

Read on for the steps.

Auditing Options With Azure SQL Data Warehouse

Janusz Rokicki explores what is available in Azure SQL Data Warehouse when it comes to auditing:

Auditing is disabled by default and the UI experience depends on the region to which the logical server is deployed. For instance, in UK South, the portal offers no options to manage auditing:

In North Europe, the portal allows Table Auditing (table-storage based) to be enabled on the SQL Data Warehouse scope, but it isn’t possible to enable Blob Auditing:

On top of that, Blob Auditing behaves differently when enabled on a logical server level in different regions. In locations that support Table Auditing, turning on Blob Auditing automatically enables it in all databases, including SQL Data Warehouses—and that’s expected. In other regions, Blob Auditing is not automatically enabled and has to be turned on programmatically by calling ARM REST API.

I imagine the plan is to support this across the board but it’s rolling out region by region.

User-Defined Restore Points In Azure SQL DW

Kevin Ngo announces a new feature in Azure SQL Data Warehouse:

Previously, SQL DW supported only automated snapshots guaranteeing an eight-hour recovery point objective (RPO). While this snapshot policy provided high levels of protection, customers asked for more control over restore points to enable more efficient data warehouse management capabilities leading to quicker times of recovery in the event of any workload interruptions or user errors.

Now, with user-defined restore points, in addition to the automated snapshots, you can initiate snapshots before and after significant operations on your data warehouse. With more granular restore points, you ensure that each restore point is logically consistent and limit the impact and reduce recovery time of restoring the data warehouse should this be needed. User-defined restore points can also be labeled so they are easy to identify afterwards.

Creating a user-defined restore point is a one-liner in Powershell, and it’s something you could do after each warehouse load, for example.

Identity Columns And Linked Servers

Kevin Feasel



Kenneth Fisher points out an oddity when inserting data across a linked server into a table with an identity column:

So far so good. Now let’s throw in a twist. Let’s call it through a linked server.

INSERT INTO [(local)\sql2014cs].Test.dbo.IdentTest	VALUES ('Col1','Col2');

Msg 213, Level 16, State 1, Line 4
Column name or number of supplied values does not match table definition.

Well that’s a bit odd, right? I mean I used that exact command in the previous test. Turns out that when you do an insert across a linked server that identity column is not ignored. Which means we just need to include the identity value right? Nope.

INSERT INTO [(local)\sql2014cs].Test.dbo.IdentTest	VALUES (1,'Col1','Col2');

Msg 7344, Level 16, State 1, Line 4
The OLE DB provider “SQLNCLI11” for linked server “(local)\sql2014cs” could not INSERT INTO table “[(local)\sql2014cs].[Test].[dbo].[IdentTest]” because of column “Id”. The user did not have permission to write to the column.

Click through to see how to do this.


June 2018
« May Jul »