Press "Enter" to skip to content

Author: Kevin Feasel

Visuals I Like

I continue my series on dashboard visualization:

This leads me to a little bit of advice for choosing bars versus columns.  You will want to choose a bar chart if the following are true:

  1. Category names are long, where by “long” I mean more than 2-3 characters.
  2. You have a lot of categories.
  3. You have relatively few periods—ideally, you’ll only have one period with a bar chart.

By contrast, you would choose a column chart if:

  1. Viewing across periods is important.  For example, I want to see the number of critic reviews fluctuate across the season for each of the TV shows.
  2. You have many periods with relatively few categories.  The more periods and the fewer categories, the more likely you are to want a column chart.
  3. Category names are short, by which I mean approximately 1-3 characters.

Some people will rotate text 90 degrees to try to turn a bar chart into a column chart.  I don’t like that because then people need to rotate the page or crane their necks.  In that case, just use the bar chart.

I like Cleveland dot plots, but they’re not implemented at all in Power BI and the two add-ons in the store aren’t that great either.  Also, there’s bonus material explaining why The Punisher season 1 was better than Daredevil season 1.

Comments closed

Aggregations And Always Encrypted

Monica Rathbun finds trouble with Always Encrypted:

The real challenges started when the client began to test their application code. The first thing we hit was triggers.

The table had several insert triggers associated with the columns that were now encrypted. Since the data was now encrypted the insert triggers would fail. Again, we lucked out and they were able to recode somethings in order to remove the triggers. Of course, since troubles always come in threes, this was no different. First the constraint problem, then the triggers, then we hit the biggest road block that halted our Always Encrypted implementation.

Read on for more information about the things you cannot do with Always Encrypted, including some limitations which will eventually go away.

Comments closed

Installing SSRS 2017

Dave Mason shows how to install Reporting Services 2017:

The SSRS 2017 installation media was easy to find and download from Microsoft. When I ran it, the installation process was simple. There were very few choices to make, and none of them were terribly important or impactful. Other than clicking “Next” buttons, the only choices and input required was to choose the SSRS Edition (or enter a product key), check the box to accept license terms, and choose an installation path (if you don’t want the default). It was so easy, it almost feels like a waste of time to post the screen shots. But since I have them, here they are:

Click through for a block of screenshots and more install info.  As for Dave’s question as the end, I think the only way you can have two versions of SSRS 2017 on the same instance is if you have Reporting Services and Power BI Report Server, and they’ll show up in Reporting Services Configuration Manager as SSRS and PBIRS, respectively.

Comments closed

Storing Credentials For Containers

Andrew Pruski shows how to store a credential using Powershell and pass it into a Docker container:

I work with SQL Server in containers pretty much exclusively when testing code and one of my real bug bears is that SQL Server in containers does not support Windows authentication (unless you’re using Windocks).

So when I’m working I find it quite annoying to have to specify a SA username & password when I want to connect.

OK, I can use Get-Credential, assign to a variable, and then reference that in a connection string but I want something a bit more permanent especially as I always use the same password for all my containers

Read on for Andrew’s method, and check out Rob Sewell’s method in the comments.

Comments closed

Using Powershell To Deploy Perfmon Collectors

Raul Gonzalez has a bonus post in his Perfmon data series:

As I said, when it’s time to deploy the solution explained in my previous posts to a number of servers it might get very tedious, specially if we have servers running multiple instances, since each have different counter names because the instance name is part of that name, and if we create one template, that won’t apply to all cases, so a lot of manual intervention.

So I decided to do what I like the most and got to write some queries that combined with some powershell will do the job for me.

Read on for the script and more information.

Comments closed

Setting Up SparklyR In Azure

David Smith shows how you can spin up a Spark cluster in Azure and install SparklyR on top of it:

The SparklyR package from RStudio provides a high-level interface to Spark from R. This means you can create R objects that point to data frames stored in the Spark cluster and apply some familiar R paradigms (like dplyr) to the data, all the while leveraging Spark’s distributed architecture without having to worry about memory limitations in R. You can also access the distributed machine-learning algorithms included in Spark directly from R functions.

If you don’t happen to have a cluster of Spark-enabled machines set up in a nearby well-ventilated closet, you can easily set one up in your favorite cloud service. For Azure, one option is to launch a Spark cluster in HDInsight, which also includes the extensions of Microsoft ML Server. While this service recently had a significant price reduction, it’s still more expensive than running a “vanilla” Spark-and-R cluster. If you’d like to take the vanilla route, a new guide details how to set up Spark cluster on Azure for use with SparklyR.

Read on for more details.

Comments closed

Apache NiFi 1.5 Updates

Tim Spann shows off some nice additions to Apache NiFi:

Another cool processor that I will talk about in greater detail in future articles is the much-requested Spark Processor. The ExecuteSparkInteractive processor with its Livy Controller Service gives you a much better alternative to my hacky REST integration to calling Apache Spark batch and machine learning jobs.

There are a number of enhancements, new processors, and upgrades I’m excited about, but the main reason I am writing today is because of a new feature that allows for having an Agile SDLC with Apache NiFi. This is now enabled by Apache NiFi Registry. It’s as simple as a quick git clone or download and then, you’ll use Apache Maven to install Apache NiFi Registry and start it. This process will become even easier with future Ambari integration for a CLI-free install.

To integrate the Registry with Apache NiFi, you need to add a Registry Client. It’s very simple to add the default local one — see below.

There are several new features in the latest release.

Comments closed

How Meltdown And Spectre Have Affected Spark Performance

Chris Stevens, et al, show how DAtabricks customers have fared in a post-Meltdown+Spectre world:

On AWS, we have observed a small performance degradation up to 5% since January 4th. On i3-series instance types, where we cache data on the local NVMe SSDs (Databricks Cache), we have observed a degradation up to 5%. On r3-series instance types, in which the benchmark jobs read data exclusively from remote storage (S3), we have observed a smaller increase of up to 3%. The greater percentage slowdown for the i3 instance type is explained by the larger number of syscalls performed when reading from the local SSD cache.

The chart below shows before and after January 3rd in AWS for a r3-series (memory optimized) and i3-series (storage optimized) based cluster.  Both tests fixed to the same runtime version and cluster size. The data represents the average of the full benchmark’s runtime per day, for a total of 7 days prior to January 3 (before is in blue) and 7 days after January 3 (after is in red). We exclude January 3rd to prevent partial results.  As mentioned, the i3-series has the Databricks Cache enabled on the local SSDs, resulting in roughly half of the total execution time (faster) compared to the r3-series results.

Overall, they’re seeing a degredation of 2-5%.  Click through for some more information on how they collected their metrics.

Comments closed

Comparing Data Lake Job Runs

Yanan Cai shows how to compare stats on different executions of a job:

Troubleshooting issues in recurring job is a time-consuming task. It starts with searching through the Job Browser to find instances of a recurring job and identifying both baseline and anomalous performance. This is followed by multi-way comparisons between job instances to figure out what has been changed in the query, data or environment. This is followed by analysis to discover which changes may have performance impact. While this is happening production workloads continue to under-perform or go down.

Azure Data Lake Tools for Visual Studio now makes it easy to spot anomalies and quickly trace the key characteristics across recurring job instances allowing for an efficient debugging experience. The Pipeline Browser automatically groups recurring jobs to simplify discovery of all runs. The Related Job View collects data about inputs, outputs and execution across multiple runs into a single visualization.

Read on for more.

Comments closed