Press "Enter" to skip to content

Author: Kevin Feasel

Using tsoutliers() to Detect Time Series Outliers

Rob J. Hyndman shows off a function in the forecast package in R:

The tsoutliers() function in the forecast package for R is useful for identifying anomalies in a time series. However, it is not properly documented anywhere. This post is intended to fill that gap.

The function began as an answer on CrossValidated and was later added to the forecast package because I thought it might be useful to other people. It has since been updated and made more reliable.

Read on to see how it works. This is one of the reasons I like the R programming language so much for data analysis and statistics: usually, somebody smarter than me has already built a solution to the problem and it’s just a matter of finding the right function. H/T R-Bloggers

Comments closed

Indexed Views and SARGability

Erik Darling shows how you can create indexed views to make life easier when tuning queries:

There are some things that, in the course of normal query writing, just can’t be SARGablized. For example, generating and filtering on a windowing function, a having clause, or any other runtime expression listed here.

There are some interesting ways to use indexed views to our advantage for some of those things. While windowing functions and having clauses can’t be directly in an indexed view, we can give an indexed view a good definition to support them.

It won’t always work, but it is an option to keep in mind.

Comments closed

Solving the Knapsack Problem with PostgreSQL

Francesco Tisiot is packing for a trip:

People have had to pack luggage for a long time, so this optimization dilemma is far from new. It even has a name: the knapsack problem. It can be applied to a variety of use cases where there is a set of items with a defined weight (space occupied in the luggage in our example) and value (item benefits during holiday) and where the total weight is limited (luggage size).

The end goal is to come up with a set of items that fits within the weight constraint and has the maximum possible value.

Francesco provides us one possible solution to the problem. I like the knapsack problem a lot, but I like the Holyfield problem even more.

Comments closed

Adding Report Names to the Log Analytics Report

Gilbert Quevauvilliers feels complete:

I was really excited to use the Power BI Log Analytics for Analysis Services Engine report when it was released along with this blog post from the Power BI Team: Announcing long-term usage and performance insights (Public Preview) | Microsoft Power BI Blog | Microsoft Power BI

This is really a great report when using Log Analytics.

I found the one thing that I wanted to view was my report names. The standard report did not have this.

Protip: displaying GUIDs is not the same as displaying useful information. I recommend reading through this just to see how much pain and effort it takes to make the Log Analytics report actually become useful.

Comments closed

Accessing Network Shares from SQL Server

Daniel Hutmacher engages in chicanery:

Using a local service account for your SQL Server service, your server won’t automatically have permissions to access to other network resources like UNC paths. Most commonly, this is needed to be able to perform backups directly to a network share.

Using a domain account as your SQL Server service account will allow the server to access a network share on the same domain, but if the network share is not on your domain, like an Azure File Share, you need a different solution.

There’s a relatively easy way to make all of this work, though.

Click through to see how, as well as several methods to make it work within SQL Server.

Comments closed

Running .NET Apps on Raspberry Pi 4

Joy George Kunjikkur installs the .NET runtime on a Raspberry Pi 4:

Here we are continuing the experiments with Raspberry Pi 4. As a .Net developer, what is the meaning if we cannot install .Net into RasPi and run one program?

Please note this post is aiming at installing the .Net runtime, not the SDK. Development and compilation will be done outside of RasPi. Also, this is not aiming to run ASP.Net, just simple .Net console apps only.

One other option, which Azure IoT Hub uses, is to install moby and deploy your .NET apps as containers. But if you don’t want to do that, click through for a few techniques.

Comments closed

Using Radar Charts

Mike Cisneros explains what radar (or spider) charts are and how they work:

A spider chart, also sometimes called a radar chart, is often used when you want to display data across several unique dimensions. Although there are exceptions, these dimensions are usually quantitative, and usually range from zero to a maximum value. Each dimension’s range is normalized to one another, so that when we draw our spider chart, the length of a line from zero to a dimension’s maximum value will be the same for every dimension.

Spider charts can be found in lots of industries, but rarely in large numbers. In our experience, they’re most likely to pop up in food science (comparing products across multiple different facets of taste, texture, etc.) and in sports analytics (comparing athletes across several dimensions of performance). In one of our previous #SWDchallenges, several participants found other use cases for spider charts, such as comparing series on a time-cycle, comparing the volume of searches for different terms, or even visualizing the motifs in a piece of music.

My favorite use of the radar chart was in the Madden NFL series, where I spent far too much of my youth comparing attributes between prospects—for example, the quarterback radar chart might have throwing power, throwing accuracy, speed, strength, and awareness. Then, you compare the relative sizes and spikes of players to gauge who would be better. (Except that in the case of Madden, it was all a lie—turns out the radar charts weren’t actually based on anything, so as usual, a youth wasted).

More recently, Bruce Nolan came up with a radar chart to visualize quarterback play across a set of complementary measures:

Image
Comments closed

Large-Scale Database Schema Comaprison

Raul Gonzalez creates a process to compare database schemas across a large number of servers:

There are tools that can compare two databases in great detail, but the caveat I found, it is that they can only compare two databases, so while they’ll be fine to compare DEV<=>PROD or QA<=>PROD, they’re not a good fit to compare and find the differences in, literally, thousands of databases

There it’s where I was, so after some time trying to postpone the inevitable, it was time to roll up my sleeves and get the job done.

Click through for Raul’s explanation and process. At one point, I’d written something which used Red Gate’s Schema Compare DLL to do something similar. The big problem you run into is when there is no single, canonically correct database and you’re trying to generate one from the tangled mess of objects.

Comments closed

SQLCMD and the Question Mark

Solomon Rutzky stops SQLCMD in its tracks:

This is the story of one such time that I stumbled upon something interesting, and all because I made a mistake. I was doing testing for the previous post (mentioned above) and somehow entered an extra character, or forgot the closing quote on a string literal, or something. Either way, I submitted some T-SQL in the sqlcmd utility that should have produced a parsing error. But instead, it cancelled the entire query batch, and without returning any errors.

Read on to learn more and maybe have a little fun with co-workers.

Comments closed

More on SARGability

Erik Darling continues a series on SARGability. First up, max datatype columns aren’t going to cut it:

No matter how much you scream, holler, and curse, when you try to filter data in a column with a max type, that predicate can’t be pushed to when you touch the index.

Leaving aside that max data types can’t be in the key of an index, because that would be insane, even if you stick them in the includes you’re looking at a lot of potential bulk, and not the good kind that makes you regular.

Read on for an example of Erik’s point, and then go to the next post, which covers fixing functions:

The bottom line on scalar UDFs is that they’re poison pills for performance.

They’re bad enough in the select list, but they get even worse if they appear in join or where clause portions of the query.

The example query we’re going to use doesn’t even go out and touch other tables, which can certainly make things worse. It does all its processing “in memory”.

Both of these are worth checking out.

Comments closed