Press "Enter" to skip to content

Author: Kevin Feasel

Executing Powershell Against Multiple Servers

Stuart Moore shows an easy way to execute a Powershell script against multiple servers:

We setup new PsSessions using New-PsSession, I set ErrorAction to SilentlyContinue just in case a host isn’t available for some reason (if I was being good I’d try/catch here).

As we’re just using PS standard functionality here with Get-Service there’s no need to build a a new function, we can just call this directly here. By calling Invoke-Command against a session pointed at numerous hosts we can PowerShell handle all the connection management here and just assume the command will be ran against each host. if we were running against a lot of hosts then we would want to look into using the -ThrottleLimit parameter to limit the number of concurrent hosts we’re hitting. The one little trick here is using the using scope modifier here so PS pulls in the variable defined in our main scope (gory details on scoping here

Click through for the script, and do check out the comments, where Stuart gives a bit of advice when you’re trying to execute against a large number of servers.

Comments closed

GROUP BY vs DISTINCT

Rob Farley looks at how GROUP BY and DISTINCT and lead you down different execution plan paths:

What I want to explore in this post is the particular example that we both used… to bring an important point that could be missed because of the similarity of our examples.

You see, we both happened to use a FOR XML concatenation query, looking back at the same table. We did this to simulate a practical GROUP BY – somewhere that you might feel like GROUP BY is useful, but you know that you’re not using an aggregate function like SUM or MAX, but there isn’t one available. Ok, for Aaron he could’ve used the really new STRING_AGG, but for the old-timer like me, having to use SQL Server 2005, that wasn’t available.

In this post, Rob looks at a different sort of example and sees a more complicated scenario unfold.

Comments closed

Creating Modal Dialogues In Shiny

Dean Attali announces a new shiny package:

shinyalert uses the sweetalert JavaScript library to create simple and elegant modals in Shiny. Modals can contain text, images, OK/Cancel buttons, an input to get a response from the user, and many more customizable options. A modal can also have a timer to close automatically.

Simply call shinyalert() with the desired arguments, such as a title and text, and a modal will show up. In order to be able to call shinyalert() in a Shiny app, you must first call useShinyalert() anywhere in the app’s UI.

It does look nice.  Check out Dean’s GitHub repo for more information. H/T R-Bloggers

Comments closed

Visualizing Cholesterol Data With ggplot2

Anisa Dhana uses the National Health and Nutrition Examination Survey and visualizes results with ggplot2:

From the plots above I find that regardless the different levels of diastolic and systolic blood pressure there is no substantial correlation between cholesterol and blood pressure. However, it is better to build the correlation line with geom_smooth or to calculate the Spearman correlation, although in this post we focus only on the visualization.

Lets build the correlation line.

Click through for several examples of visuals.

Comments closed

Microsoft + R

David Smith points out a bunch of the ways that Microsoft integrates R into products:

You can call R from within some data oriented Microsoft products, and apply R functions (from base R, from packages, or R functions you’ve written) to the data they contain.

  • SQL Server (the database) allows you to call R from SQL, or publish R functions to a SQL Server for database adminstrators to use from SQL.

  • Power BI (the reporting and visualization tool) allows you to call R functions to process data, create graphics, or apply statistical models to data.

  • Visual Studio (the integrated development environment) includes R as a fully-supported language with syntax highlighting, debugging, etc.

  • R is supported in various cloud-based services in Azure, including the Data Science Virtual Machine and Azure Machine Learning Studio. You can also publish R functions to Azure with the AzureML package, and then call those R functions from applications like Excel or apps you write yourself.

They’re pretty well invested in both R and Python, which is a good thing.

Comments closed

Using cowplot With ggplot2

I have a post on extending ggplot2’s functionality with cowplot:

Notice that I used geom_path().  This is a geom I did not cover earlier in the series.  It’s not a common geom, though it does show up in charts like this where we want to display data for three variables.  The geom_line() geom follows the basic rules for a line:  that the variable on the y axis is a function of the variable on the x axis, which means that for each element of the domain, there is one and only one corresponding element of the range (and I have a middle school algebra teacher who would be very happy right now that I still remember the definition she drilled into our heads all those years ago).

But when you have two variables which change over time, there’s no guarantee that this will be the case, and that’s where geom_path() comes in.  The geom_path() geom does not plot y based on sequential x values, but instead plots values according to a third variable.  The trick is, though, that we don’t define this third variable—it’s implicit in the data set order.  In our case, our data frame comes in ordered by year, but we could decide to order by, for example, life expectancy by setting data = arrange(global_avg, m_lifeExp).  Note that in a scenario like these global numbers, geom_line() and geom_path() produce the same output because we’ve seen consistent improvements in both GDP per capita and life expectancy over the 55-year data set.  So let’s look at a place where that’s not true.

The cowplot library gives you an easier way of linking together different plots of different sizes in a couple lines of code, which is much easier than using ggplot2 by itself.

Comments closed

Web Scraping With Power BI

Imke Feldmann shows how to use Power BI to scrape multiple tables from a webpage:

I will present 2 methods here:

  1. Append-method: This is the obvious one and is fast for just a few tables.
  2. Add-Column-method: A bit more complicated but will be faster for a large number of tables and is also suitable for a dynamic number of tables.

You will also find 2 options at the end of this article:

  1. Use custom functions for multi-step table transformations

  2. Use dynamic filters to select the desired tables

Read the whole thing.

Comments closed

Using Date Types In Warehouses

Koen Verbeeck argues that date keys in warehouses should be actual date types:

The worst are by far the string representation, as there is no actual check on the contents. It can literally contain everything. And is ’01/02/2018′ the first of February 2018 (like any sane person would read, because days come before months), or the 2nd of January? So if you have to store dates in your data warehouse, avoid strings at all costs. No excuses.

The integer representation – e.g. 20171208 – is really popular. If I recall Kimball correctly, he said it’s the one exception where you can use smart keys, aka surrogate keys that have a meaning embedded into them. I used them for quite some time, but I believe I have found a better alternative: using the actual date data type.

I bounce back and forth, but I’m sympathetic to Koen’s argument, which you can read by clicking through.

Comments closed

An Introduction To Splunk

Victoria Holt has some basics on Splunk:

Splunk, a software platform, has the capability to leverage machine data for data management and analytics.  It can be used for

  • Data driven decision making
  • Alerts for network security threats
  • Report on system failures
  • Analyse and improve functionality

It enables performance analysis, dashboard creation, monitoring, troubleshooting and investigation of the real-time data collected. A Edureka learning video showed the Splunk components.

Advanced Splunk queries are still a bit like magic to me, but this is a very powerful service once you get a handle on how it works.

Comments closed

Appropriate Data Types And Unicode

Raul Gonzalez on (in)appropriate use of National character strings:

Yes, you have read it… I see dates stored as NVARCHAR(10) and NCHAR(10) on daily basis, please don’t ask me why.

This case is even worse, because DATE takes 3 bytes where NCHAR(10) takes 20 bytes, yes Ladies and Gentlemen more than 6 times more space to store the same data.

But wait! how can you be certain that those ten characters are actually a valid date? You can’t, unless you reinvent the wheel and validate that those dates are obviously valid dates and pay the performance penalty of doing it.

You’d think that picking the right data type for something would be fairly easy and then you find a table with a few dozen NVARCHAR(MAX) columns.

Comments closed