2018-05-07 – Curated SQL

In this post we are going to go over the steps to install R Tools For Visual Studio 2017. RStudio has a development environment that is bare bones for the free version. Visual Studio 2017 offers a more robust development environment if you download the R Tools feature.

Here are the steps to install R Tools for Visual Studio:

R Tools for Visual Studio 2015 is still a separate download.

Comments closed

Single-Node PySpark

Published 2018-05-07 by Kevin Feasel

Gengliang Weng, et al, explain that even a single Spark node can be useful:

It’s been a few years since Intel was able to push CPU clock rate higher. Rather than making a single core more powerful with higher frequency, the latest chips are scaling in terms of core count. Hence, it is not uncommon for laptops or workstations to have 16 cores, and servers to have 64 or even 128 cores. In this manner, these multi-core single-node machines’ work resemble a distributed system more than a traditional single core machine.

We often hear that distributed systems are slower than single-node systems when data fits in a single machine’s memory. By comparing memory usage and performance between Spark and Pandas using common SQL queries, we observed that is not always the case. We used three common SQL queries to show single-node comparison of Spark and Pandas:

Query 1. SELECT max(ss_list_price) FROM store_sales

Query 2. SELECT count(distinct ss_customer_sk) FROM store_sales

Query 3. SELECT sum(ss_net_profit) FROM store_sales GROUP BY ss_store_sk

To demonstrate the above, we measure the maximum data size (both Parquet and CSV) Pandas can load on a single node with 244 GB of memory, and compare the performance of three queries.

Click through for the results.

Comments closed

Relationships Between Numerical Features

Published 2018-05-07 by Kevin Feasel

Stacia Varga continues her exploratory data analysis series using hockey data:

Let’s start with something easy and understandable to analyze. If I put age on the horizontal axis and weight on the vertical axis. It’s a common practice to put an explanatory variable on the horizontal axis and a response variable on the vertical axis. In other words, I’m looking to see how an increase in age (explanation) affects – or not – weight (response) for all the hockey players in the current season, regardless of team.

If I put age on the horizontal axis – does this explain weight? Sort of – the combinations of age and weight have some groupings. It almost appears that there is a greater number of younger, heavier players than older, heavier players, but it’s hard to tell here how the age/weight combinations are distributed because I can’t see all the individual points.

Read the whole thing, while keeping in mind that correlation does not imply causation.

Comments closed

Windows Containers And Loopback

Published 2018-05-07 by Kevin Feasel

Andrew Pruski notes an improvement with the April 2018 Windows update:

The April 2018 update for Windows brought a few cool things but the best one (imho) is that now we can now connect to Windows containers locally using ‘localhost’ and the port specified upon container runtime.

Let’s have a look at how this works.

Click through for a walkthrough.

Comments closed

Alerting On tempdb Growth

Published 2018-05-07 by Kevin Feasel

Lori Brown shows how to use a SQL Agent alert to warn you if tempdb grows beyond a certain size:

Lastly, create a SQL Alert to notify you as soon as tempdb grows past the threshold you stipulate. Using the GUI to create the alert, you need to fill out every field on the General page and make sure the Enabled checkbox is marked. Create a Name for the alerts, then specify the Type as SQL Server performance condition alert. The Object should be Databases, the Counter is Data File(s) Size (KB), and the Instance will be tempdb. The alert will trigger if counter rises above the value. The Value will depend upon the cumulative size of your tempdb files. In this case each tempdb file is 12GB (or 12,288,000 KB), so the total size is 98,304,000 KB.

I liked the approach of only firing the SQL Agent job after a trigger was met, rather than running a job which queries and then creates an e-mail afterward.

Comments closed

Default Displayed Properties In Powershell

Published 2018-05-07 by Kevin Feasel

Claudio Silva explains the default displayed properties in Powershell and how you can find non-default properties:

First, let me say that this person knows that Select-Object can be used to select the properties we want, so he tried to guess the property name using a trial/error approach.

The person tried:
Get-Service WinRM | Select-Object Startup, Status, Name, DisplayName
and also:
Get-Service WinRM | Select-Object StartupType, Status, Name, DisplayName
But all of them were just empty.

There is a better way.

Comments closed

Switching Between Windows And Linux Containers

Published 2018-05-07 by Kevin Feasel

Chris Taylor demonstrates a couple ways of switching from Linux to Windows containers in Docker for Windows:

If you are using Docker for Windows and want to switch between Linux or Windows containers you can do this by right clicking the Docker “Whale” in the systray and selecting “Switch to Windows containers”:

….but no one likes clicking around do they!

There is an alternative way to do this which I use in my docker session demo’s which makes things so much easier and the switch is a lot quicker!

Click through for the Powershell call, which has the added benefit of being scriptable.

Comments closed

Automatically Updating dbatools

Published 2018-05-07 by Kevin Feasel

Garry Bargsley gives us two ways to update dbatools on a schedule:

I have been using dbatools heavily since I was introduced to it. I have automated processes and created new processes with it. There are new commands that come out almost daily that fill in certain gaps or enhance current commands. One way to stay current with these updates is to update your dbatools install frequently.

How better to do this than to have an auto update process that will run daily and get the latest dbatools version for you…

I have put together two ways of doing this based on your preferred method. One is via a SQL Agent Job and the other is using a Windows Task Scheduler job.

Read on for examples of both techniques.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: May 7, 2018

Installing R Tools for Visual Studio 2017

Single-Node PySpark

Relationships Between Numerical Features

Windows Containers And Loopback

Alerting On tempdb Growth

Default Displayed Properties In Powershell

Switching Between Windows And Linux Containers

Automatically Updating dbatools