The Theory Behind cdata

Kevin Feasel



John Mount has a video explaining the concepts behind cdata:

We also have two really nifty articles on the theory and methods:

Please give it a try!

Click through for the video, which I found very helpful in tying together a number of data transformation operations (pivoting, unpivoting, one-hot encoding, etc.).

Microsoft R Open 3.4.3

Kevin Feasel


R, Versions

David Smith announces Microsoft R Open 3.4.3:

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.4.3 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R (version 3.4.3) and updates the bundled packages (specifically: checkpointcurldoParallelforeach, and iterators) to new versions.

MRO is 100% compatible with all R packages. MRO 3.4.3 points to a fixed CRAN snapshot taken on January 1 2018, and you can see some highlights of new packages released since the prior version of MRO on the Spotlights page. As always, you can use the built-in checkpoint packageto access packages from an earlier date (for reproducibility) or a later date (to access new and updated packages).

That brings Microsoft up to speed with base R.

Set Operations In Spark

Fisseha Berhane compares SparkSQL, DataFrames, and classic RDDs when performing certain set-based operations:

In this fourth part, we will see set operators in Spark the RDD way, the DataFrame way and the SparkSQL way.
Also, check out my other recent blog posts on Spark on Analyzing the Bible and the Quran using Spark and Spark DataFrames: Exploring Chicago Crimes.

The data and the notebooks can be downloaded from my GitHub repository.
The three types of set operators in RDD, DataFrame and SQL approach are shown below.

This is where SparkSQL (and SQL in general) shines, although the DataFrame approach is also compact.

Visuals I Like

I continue my series on dashboard visualization:

This leads me to a little bit of advice for choosing bars versus columns.  You will want to choose a bar chart if the following are true:

  1. Category names are long, where by “long” I mean more than 2-3 characters.
  2. You have a lot of categories.
  3. You have relatively few periods—ideally, you’ll only have one period with a bar chart.

By contrast, you would choose a column chart if:

  1. Viewing across periods is important.  For example, I want to see the number of critic reviews fluctuate across the season for each of the TV shows.
  2. You have many periods with relatively few categories.  The more periods and the fewer categories, the more likely you are to want a column chart.
  3. Category names are short, by which I mean approximately 1-3 characters.

Some people will rotate text 90 degrees to try to turn a bar chart into a column chart.  I don’t like that because then people need to rotate the page or crane their necks.  In that case, just use the bar chart.

I like Cleveland dot plots, but they’re not implemented at all in Power BI and the two add-ons in the store aren’t that great either.  Also, there’s bonus material explaining why The Punisher season 1 was better than Daredevil season 1.

Aggregations And Always Encrypted

Monica Rathbun finds trouble with Always Encrypted:

The real challenges started when the client began to test their application code. The first thing we hit was triggers.

The table had several insert triggers associated with the columns that were now encrypted. Since the data was now encrypted the insert triggers would fail. Again, we lucked out and they were able to recode somethings in order to remove the triggers. Of course, since troubles always come in threes, this was no different. First the constraint problem, then the triggers, then we hit the biggest road block that halted our Always Encrypted implementation.

Read on for more information about the things you cannot do with Always Encrypted, including some limitations which will eventually go away.

Installing SSRS 2017

Dave Mason shows how to install Reporting Services 2017:

The SSRS 2017 installation media was easy to find and download from Microsoft. When I ran it, the installation process was simple. There were very few choices to make, and none of them were terribly important or impactful. Other than clicking “Next” buttons, the only choices and input required was to choose the SSRS Edition (or enter a product key), check the box to accept license terms, and choose an installation path (if you don’t want the default). It was so easy, it almost feels like a waste of time to post the screen shots. But since I have them, here they are:

Click through for a block of screenshots and more install info.  As for Dave’s question as the end, I think the only way you can have two versions of SSRS 2017 on the same instance is if you have Reporting Services and Power BI Report Server, and they’ll show up in Reporting Services Configuration Manager as SSRS and PBIRS, respectively.

Storing Credentials For Containers

Andrew Pruski shows how to store a credential using Powershell and pass it into a Docker container:

I work with SQL Server in containers pretty much exclusively when testing code and one of my real bug bears is that SQL Server in containers does not support Windows authentication (unless you’re using Windocks).

So when I’m working I find it quite annoying to have to specify a SA username & password when I want to connect.

OK, I can use Get-Credential, assign to a variable, and then reference that in a connection string but I want something a bit more permanent especially as I always use the same password for all my containers

Read on for Andrew’s method, and check out Rob Sewell’s method in the comments.

Using Powershell To Deploy Perfmon Collectors

Raul Gonzalez has a bonus post in his Perfmon data series:

As I said, when it’s time to deploy the solution explained in my previous posts to a number of servers it might get very tedious, specially if we have servers running multiple instances, since each have different counter names because the instance name is part of that name, and if we create one template, that won’t apply to all cases, so a lot of manual intervention.

So I decided to do what I like the most and got to write some queries that combined with some powershell will do the job for me.

Read on for the script and more information.

Why You’re Pestering

Rob Sewell shows off the Because parameter in Pester 4.2.0:

This release adds the Because parameter to the all assertions. This means that you can add a reason why the test has failed. As JAKUB JAREŠ writes here

  • Reasons force you think more

  • Reasons document your intent

  • Reasons make your TestCases clearer

Click through for examples galore.  4.2.1 should have a -Parent macro which inputs -Because “I  said so”.


January 2018
« Dec Feb »