Outlier Detection In R

Giorgio Garziano has an introduction to outlier detection and intervention analysis using R:

Now, we implement a similar representation of the transient change outlier by taking advantage of the arimax() function within the TSA package. The arimax() function requires to specify some ARMA parameters, and that is done by capturing the seasonality as discussed in ref. [1]. Further, the transient change is specified by means of xtransf and transfer input parameters. The xtransf parameter is a matrix with each column containing a covariate that affects the time series response in terms of an ARMA filter of order (p,q). For our scenario, it provides a value equal to 1 at the outliers time index and zero at others. The transfer parameter is a list consisting of the ARMA orders for each transfer covariate. For our scenario, we specify an AR order equal to 1.

Check it out.

What’s In Your Powershell Profile?

Shane O’Neill wants to know what’s in your Powershell profile:

This brings me back to the main point. My profile does 3 things

  1. Changes the default colour of error and warning messages,
  2. Sets an alias for notepad to “n” since I use it so much Set-Alias -Name n -Value notepad , and
  3. loads up the dbatools prompt

I don’t import any modules because the two that I use the most are updated so frequently plus I’m currently using PowerShell 5 so they get automatically loaded when I type in one of their commands.

If you don’t already have a profile, read on and see how you can do it.  And over-do it if you’re not careful.

Warning When Using dplyr Mutate

John Mount has a warning if you are using dplyr’s mutate function and connecting to Spark or a database:

If you are using the R dplyr package with a database or with Apache Spark: I respectfully advise you inspect your code to ensure you are not using any values created inside a dplyr::mutate() statement inside the same dplyr::mutate() statement. This has been my coding advice for some time, and it is a simple and safe re-factoring to break up such statements into safer sequences (simply by introducing more dplyr::mutate()s).

I have since encountered a non-signaling (or silent) result corruption version of the issue. We are now advising code inspection as we now have confirmation that not seeing a thrown error is not a reliable indication of correct execution and correct results.

Thanks to John for reporting, and hopefully the dplyr team can fix it.

Larger Azure SQL Database Standard Tier Sizes

Tim Radney reports on a new Standard tier preview for Azure SQL Database:

Previously, the Standard tier only offered 4 levels: 15, 30, 50, and 100 DTUs, with a database size limit of 250GB, with standard disk. If you had a database that was larger than 250GB, however did not need more than 100 DTUs for CPU, memory, or I/O, you were stuck paying a Premium price just for database size. With the new changes, you can now have up to a 1TB database in the Standard tier; you just have to pay the extra storage. Currently storage is being billed at $0.085/GB during the preview. Increasing from the included size of 250GB to 1TB increases by 774GB at a cost of $65.79 per month.

The new Standard preview DTU sizes support 200, 400, 800, 1,600, and 3,000 DTU options. If you have a SQL Server database workload that is more CPU-bound than I/O, these Standard tier options have the potential to save you a lot of money; however, if your workload is I/O bound, the Premium tier is going to outperform the Standard tier.

Tim follows this up with a couple of quick demos.

Casting And Conversion Defaults

Greg Low is a bit disappointed with TRY_CAST and TRY_CONVERT:

Surprised? I’d have to say that I was. Now as my buddy Adam Machanicpointed out, it’s not the fault of TRY_CAST and TRY_CONVERT because they just TRY to do a CAST and a CONVERT. And it’s the original functions that have the bizarre behavior.

Can’t say that I love this because it means that I can’t use these functions for their purpose, except for decimal. So that then left me wondering which types had this behavior.

Check it out.  One way to get around this default behavior could be to use NULLIF, so TRY_CAST(NULLIF(@InputVar, ”) AS INT).

Process Mapping On Linux With SQL Server And Oracle

Kellyn Pot’vin-Gorman contrasts SQL Server versus Oracle outputs when running a couple common Linux process commands:

In our Oracle environment, we can see every background process, with it’s own pid and along with the process monitor, (pmon)db writer, (dbwr), log writer, (lgwr), we also have archiving, (arcx), job processing, (j00x) performance and other background processing.  I didn’t even grep for the Oracle executable, so you recognize how quickly we can see what is running.

In the SQL Server environment, we only have two processes- our parent process is PID 7 and the child is 9 for SQL Server and nothing to distinguish what they actually are doing.  If we decide to use the pmap utility to view what the parent and child process aredoing, we see only sqlservr as the mapping information.

I imagine that things like this will improve over time for SQL Server, but Oracle definitely has a leg up in this regard.

Spinning Up SQL Server Containers In Jenkins

Chris Adkin has a few tips for loading SQL Server in Jenkins as part of testing or deployment:

Problem 1 Image Tag

There is no image tag specified for the microsoft/mssql-server-linux image, therefore, if Microsoft push a newer version of the image to docker hub, this will be pulled down from docker hub when the build pipeline runs. This is easily fixed by tagging the image with a tag for an explicit version, e.g. microsoft/mssql-server-linux:2017-GA.

Click through for the starting code, two additional issues, and the corrected code.

Policing Database Deployments

Drew Furgiuele has a rather interesting library that he’s released to the general public:

The base return is the TSQLFragment object, which in turn has a Batches object, which in turn holds… well it can hold a lot of different things. When the text is parsed, it will determine what type of object to return based on the statement it determines it is. For example, if it’s an insert statement it will be a certain type of object with a given set of properties and methods, and if it’s, say, a create index statement you’ll get different properties, such as which table or view is getting the index along with the indexed columns and included columns. It really is interesting.

But interesting can a double-edged sword: since the statement object that gets returned can be different for each parsed piece of code, that means to set up any type of intelligence around the stuff we’re dealing with, we need to check for very specific objects.

Unfortunately, I never got past the first animated GIF, whose subtitle was wrong.  You, however, should read the whole thing.

TreeViz Custom Power BI Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the TreeViz Custom Visual. The TreeViz is a breakdown tree that allows you to expand or collapse levels of hierarchical data.

Click through for the video, showing more.  For limited, hierarchical, categorical data, this could work pretty well.

Categories

December 2017
MTWTFSS
« Nov  
 123
45678910
11121314151617
18192021222324
25262728293031