Press "Enter" to skip to content

Author: Kevin Feasel

Visualizing High-Density Regions with R

The rOpenSci team covers the history of the gghdr package:

This was how being a newcomer to rOpenSci OzUnconf 2019 felt. It was incredible to be a part of such a diverse, welcoming and inclusive environment. I thought it would be fun to blog about how it all began, and the twists and turns we experienced along the way as we developed the gghdr package. The package provides tools for plotting highest density regions with ggplot2 and was inspired by the package hdrcde developed by Rob J Hyndman. The highest density region approach of summarizing a distribution is useful for analyzing multimodal distributions and can be composed of numerous disjoint subsets. For example, the histogram of the highway mileage (hwy) data from the mpg dataset (a) shows that cars with 6 cylinders (cyl) are bimodally distributed, which is reflected in the highest density region (HDR) boxplot (c) but not in the standard boxplot (b). Hence, we see that HDRs are useful in displaying multimodality in the distribution.

Read on for a short history of an interesting package.

Comments closed

Filtered Indexes and Functions

Eitan Blumin looks at filtered indexes:

In fact, absolutely no functions of any kind can be used within the WHERE clause of a filtered index. Not even schema-bound user-defined scalar functions.

Unfortunately, as stated in the Microsoft Docs page about Filtered Indexes, the WHERE clause of a filtered index can only support simple comparison operators.

Well, it’s not entirely true, as you CAN actually use some functions, but on two conditions:

Read the whole thing. Eitan lays out one limitation of filtered indexes and provides a couple of potential workarounds.

Comments closed

Verbalizing a Chart

Alex Velez reminds us of the spoken side of communication:

I’m confident that I could overcome some of these design challenges by effectively explaining the graph to someone else. Will it be a perfect data communication? No—but sometimes, we have to deal with less-than-ideal circumstances like time limitations, or not having control over our designs. Knowing how to verbalize a graph can be a practical solution when faced with these constraints.

I should caveat this by clarifying that my intention is not to say that we shouldn’t spend time on our visualizations. But too often, we focus only on the visual. We believe that a graph or a picture is worth a thousand words. Or maybe we assume that because we created the chart, we will automatically know how to talk through it. I am super guilty of this!

Read on for some tips on vocalizing a visual.

Comments closed

Fun with the SSMS Extended Events UI

Grant Fritchey airs a few grievances:

I like Extended Events and I regularly use the Session Properties window to create and explore sessions. I’m in the window all the time, noting it’s quirks & odd behaviors, even as it helps me get stuff done. However, found a new one. Let me tell you about just a few of them.

Click through for some examples of UI oddities when working with session properties.

Comments closed

Unicode and Data Length

Kevin Wilkie lays out an argument:

If you truly need the UNICODE characters in your data, go ahead and use them! If not though, please make your DBA happy by not using them. Since UNICODE characters take up twice the amount of space as the ASCII versions do, then your DBAs will recommend to use the ASCII versions if you are not going to be using any UNICODE characters.

Read on for the justification. But I’m still NVARCHAR (Almost) Everywhere.

Comments closed

Replication in Azure DB for MySQL

Arun Sirpal explains how you can set up replication with Azure DB for MySQL:

No doubt there will be a need for you to split off your analytical queries from the main database for performance reasons.

If you have been following me in the past with Azure SQL DB you would use failover group read endpoints. With MySQL we would need to build a replica (read only) to another server. This uses MySQL’s native feature binlog replication which is great to hear. This form is asynchronous.

Read on to see how.

Comments closed

Power BI Implementation Planning Guidance

Melissa Coates has a plan:

I’m really excited to share with you that we’re working on a new set of Power BI guidance called “Power BI Implementation Planning.”

The first pieces of content are some of the most common Power BI usage scenarios. In terms of scope of content, we’re just getting started – there will be LOTS more content to come. We’ll be iteratively publishing additions to this set of content over the next several months.

There are a lot of smart people working on this project.

Comments closed

A Guide to Dapper

Camilo Reyes shows how to use Dapper:

Dapper is a lightweight shim around ADO.NET for data access via extension methods. To keep this relevant to any real application, there is quite a bit of code, which I won’t be able to show, so I recommend downloading the repo from GitHub. The focus here is to walk through the code API and follow best practices for building a DAL. The set of extension methods can feel overwhelming because there is a lot of functionality to cover.

I recommend general familiarity with LINQ and the List<> generic class in C#. This guide can be used as a reference so you can read one section at a time. At each pause, I encourage you to play with the code and allow the information to sink in.

I tend to like Dapper a lot, as it’s basically a lightweight wrapper around ADO. It’s very efficient as a result and you can also use stored procedures quite easily.

Comments closed

Azure ML and MLOps

I continue a series on Azure ML:

We ended the prior series with model deployment via the Azure ML Studio UI. This is entirely manual and UI-driven. Then, we looked at model deployment via manually-run notebooks. This is still manual but at least offers the possibility of automation as we control the code to run.

From there, we moved to model deployment via the Azure CLI and Python SDK. Now we have the capability to run, train, register, and deploy models via scripts. This leads to the next phase in the process, in which we can perform continuous integration and continuous deployment of models using a tool like Azure DevOps or GitHub Actions. This is where MLOps starts to shine.

Read on for a few thoughts about MLOps and software maturity.

Comments closed

Updates on SSIS Framework Manager

Andy Leonard has a progress report for us:

Kent Bradshaw and I continue to update SSIS Framework Manager, the visual tool for managing SSIS Framework Applications. In our parlance, an SSIS Application consists of one or more SSIS Packages (to which we refer in the framework as application packages) configured to execute in a specific order. To date, enterprises using SSIS Frameworks (well, our SSIS Frameworks, at least) have relied on T-SQL for management functionality. We aim to change that with the next release of our SSIS Framework which will include SSIS Framework Manager.

Click through to see what they’re working on.

Comments closed