Press "Enter" to skip to content

Day: October 10, 2017

tibbletime: Time-Aware Data Sets In R

At Business Science, they’ve announced a new R package:

We are excited to announce the release of tibbletime v0.0.2 on CRAN. Loads of new functionality have been added, including:

  • Generic period support: Perform time-based calculations by a number of supported periods using a new ~period formula~.
  • Creating series: Use create_series() to quickly create a tbl_time object initialized with a regular time series.
  • Rolling calculations: Turn any function into a rolling version of itself with rollify().
  • A number of smaller tweaks and helper functions to make life easier.

As we further develop tibbletime, it is becoming clearer that the package is a tool that should be used in addition to the rest of the tidyverseThe combination of the two makes time series analysis in the tidyverse much easier to do!

Check out their demos comparing New York and San Francisco weather.  It looks like it’ll be a useful package.  H/T R-bloggers

Comments closed

ANOVA

Mala Mahadevan explains what ANOVA is and why it’s interesting:

ANOVA – or analysis of variance, is a term given to a set of statistical models that are used to analyze differences among groups and if the differences are statistically significant to arrive at any conclusion. The models were developed by statistician and evolutionary biologist Ronald Fischer. To give a very simplistic definition – ANOVA is an extension of the two way T-Test to multiple cases.

ANOVA is an older test and a fairly simple process, but is quite useful to understand.

Comments closed

Updating A Collection Of Variables In Powershell

Klaas Vanbenberghe shows how to update multiple variables in the same way using Powershell:

We turn to two cmdlets: Get-Variable and Set-Variable. They may seem redundant as we get and set variables all the time without those. Maybe you have even never heard of these two cmdlets.
Well here they prove their usefulness:
we will set the variables using their names, and use the current value as a starting point.
An important rule to remember is this:

$ is a token to indicate we’re dealing with a variable, but it is not part of the variable name!

It’s worth reading the whole thing.

Comments closed

Neural Nets Optimizing For Imperfect

John Cook describes a paradox with neural nets:

Deep neural networks have enough parameters to overfit the data, but there are various strategies to keep this from happening. A common way to avoid overfitting is to deliberately do a mediocre job of fitting the model.

When it works well, the shortcomings of the optimization procedure yield a solution that differs from the optimal solution in a beneficial way. But the solution could fail to be useful in several ways. It might be too far from optimal, or deviate from the optimal solution in an unhelpful way, or the optimization method might accidentally do too good a job.

Conceptually, this feels a little weird but isn’t really much of a problem, as we have other analogues:  rational ignorance in economics (where we knowingly choose not to know something because the benefit is not worth the opportunity cost of learning), OPTIMIZE FOR UNKNOWN with SQL Server (where we knowingly do not use the passed-in parameter because we might get stuck in a lesser path), etc.  But the specific process here is interesting.

Comments closed

COALESCE, ISNULL, And Data Types

Shane O’Neill shows a subtle difference between ISNULL and COALESCE:

You may be asking yourself “What the…?!”.

I asked myself the same question but then I thought of a better idea and asked my Senior the question instead (he usually has better answers than I do), who proceeded to tell me that the final test would only work exactly at midnight!

….so I repeated my question to him.

It’s an interesting read, and not something you’d commonly think about.

Comments closed

Backup Compression And Encryption

Arun Sirpal shows the combined effects of backup encryption and backup compression in SQL Server 2017:

Do not forget about the certificate! Warning: The certificate used for encrypting the database encryption key has not been backed up. Imagine if you need to recover the backup and you can’t?  You will get the dreaded thumbprint error.

Msg 33111, Level 16, State 3, Line 25 Cannot find server certificate with thumbprint ‘0x78FAB5A2A5D593FD3C4E163C90B745F70AB51233’. Msg 3013, Level 16, State 1, Line 25

RESTORE DATABASE is terminating abnormally.

So make sure you respect this certificate (and the key) and back it up and re-create them on the target server for a successful restore.

In SQL Server 2016 and 2017, there’s no reason not to encrypt backups; the marginal cost is practically nil even if you’re low enough on disk space that you need to do backup compression.

Comments closed

Use Fixed Filegrowth Settings

Andy Galbraith notes that you should use fixed-increment filegrowth settings for log and data files:

As you probably already know, the key flaw to percentage-based FILEGROWTH is that over time the increment grows larger and larger, causing the actual growth itself to take longer and longer.  This is especially an issue with LOG files because they have to be zero-initialized before they can be used, causing excessive I/O and file contention while the growth is in progress.  Paul Randal (blog/@PaulRandal) describes why this is the case in this blog post.  (If you ever get a chance to see it Paul also does a fun demo in some of his classes and talks on why zero initialization is importan, using a hex editor to read the underlying contents of disk even after the old data is “deleted”)

Andy also has a script to change filegrowth to fixed-increment growth depending upon the size of the file, so check that out.

Comments closed