Press "Enter" to skip to content

Author: Kevin Feasel

MAPE and Its Flaws

Jan Fischer takes us through Mean Absolute Percentage Error as a measure of forecast quality:

Particular small actual values bias the MAPE.
If any true values are very close to zero, the corresponding absolute percentage errors will be extremely high and therefore bias the informativity of the MAPE (Hyndman & Koehler 2006). The following graph clarifies this point. Although all three forecasts have the same absolute errors, the MAPE of the time series with only one extremely small value is approximately twice as high as the MAPE of the other forecasts. This issue implies that the MAPE should be used carefully if there are extremely small observations and directly motivates the last and often ignored the weakness of the MAPE.

Jan also points out a couple of things people criticize MAPE for incorrectly, but several things for which it is actually guilty. It’s not a bad measure if you can make certain data assumptions, but Jan has a few alternatives which tend to be better than MAPE.

Comments closed

Debugging Spark Applications in Visual Studio

Ed Elliott continues a series on spark-dotnet:

There are two approaches, one I have used for years with dotnet when I want to debug something that is challenging to get a debugger attached – think apps which spawn other processes and they fail in the startup routine. You can add a Debugger.Launch() to your program then when spark executes it, a prompt will be displayed and you can attach Visual Studio to your program. (as an aside I used to do this manually a lot by writing an __asm int 3 into an app to get it to break at an appropriate point, great memories but we don’t need to do that anymore luckily :).

The second approach is to start the spark-dotnet driver in debug mode which instead of launching your app, it starts and listens for incoming requests – you can then run your program as normal (f5), set a breakpoint and your breakpoint will be hit.

Read on to see how it’s done, as well as a possibly-accidental benefit to this.

Comments closed

Configuring dbatools

Claudio Silva shares some tips on configuring dbatools:

Set a new configuration value
To update a value you need to use the Set-DbatoolsConfig command. Unfortunately, you will not find documentation for this command on our docs page. This is a known issue and it happens because that command is a cmdlet so the help is in the dbatools library itself.

For this particular case, you can and should rely on the Get-Help command.

Get-Help -Name Set-DbatoolsConfig -Full

Claudio has a half-dozen or so settings and there are more available to you.

Comments closed

Azure Data Share

James Serra takes us through a new product announcement:

A brand new product by Microsoft called Azure Data Share was recently announced. It is in public preview. To explain the product in short, any data which resides in Azure storage can be securely shared between a data provider and a data consumer. It does this by copying a snapshot of the data to the consumer’s subscription (called snapshot-based copying, and in the future there will be in-place sharing). It currently supports ADLS Gen1, ADLS Gen2, and Blob storage, and eventually will support Azure Data Explorer, SQL DB, and SQL DW. Check out the Documentation and a video, and then go try it out.

You can share the data with a few clicks as long as the user you are trying to share with has access to an Azure Subscription and storage account. The copying and updating of the data is handled for you using the Microsoft backbone for best performance, and is encrypted during transit. You can specify the frequency at which the data consumers receive updates. It also is a simple way to control, manage, and monitor all of your data sharing.

This is a smart idea. Sharing data between companies is a key requirement in a lot of B2B solutions, yet methods for sharing range from high-development and medium-friction (create your own API) to low-development and high-friction (FTP, e-mail).

Comments closed

Accelerated Database Recovery and Filegroups

Randolph West shows a change to Accelerated Database Recovery in SQL Server 2019 CTP 3.2 and later:

ADR makes use of a per-database version store, instead of putting everything in the transaction log and TempDB. In most cases, the payoff more than makes up for valuable transaction log and TempDB storage. In my testing, enabling ADR requires around 10% additional storage in your database file.

The reason for this post, however, is to mention that SQL Server 2019 CTP 3.2 introduced the ability to choose which filegroup you want to use for the version store, which will help with performance.

Read on to see how you enable this.

Comments closed

Building SQL Agent Dates and Times

Kenneth Fisher goes over one of the things in SQL Agent which make me shudder:

Occasionally I’ve seen date and time stored separately as integers. This had some practical applications back before we had date and time data types but there’s still lots of legacy code out there that use them (I’ll give you a really really common example in just a minute).

Unfortunately, you can’t convert datetime to date and time ints directly but it isn’t all that difficult.

Kenneth notes the function you can use as well as a quick query to calculate duration.

Comments closed

Disabling All Triggers on a Database

Jess Pomfret is on a mission:

Sometimes it’s best not to ask why. However, if for some reason you have a number of triggers on tables within a database that you would like to temporarily disable, read on.

I came across a situation recently while automating a process to refresh test environments where this exact scenario came up.  As part of the process several scripts were run to obfuscate production data. While these ran all the UPDATE triggers were firing. Not only were the triggers adding a significant amount of time to the process, they were also updating dates and other values that we’d prefer kept their original values.

Jess has a tiny bit of Powershell code which does this work for you.

Comments closed

Locking Azure Resources

Stuart Moore shows us how we can lock Azure resources to prevent accidental mistakes:

A resource deletion may not sound like too much of a big thing if you’re deploying Infrastructure as code, hey we’ll just terraform apply again and it’ll pop backup.

In theory that’s a great idea, just with one big problem. The new resource isn’t the old resource!

For an example, an Azure SQL Database server is a unique resource. If you delete one you lose any backups you’ve taken as they’re hosted on the server. Spinning up a new one isn’t going to get them back!

Stuart shows us how to take these locks. Because I have my doubts that everyone has all of their infrastructure prepped as terraform scripts, Stuart’s point is even more relevant.

Comments closed

From Excel to R: Three Examples

Abdul Majed Raja has a few examples of things which are easy to do in Excel and how you can do them in R:

Create a difference variable between the current value and the next value
This is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In Excel, This is simply done by creating a new formula field and subtracting the next cell with the current cell or the current cell with the previous cell and dragging the cell formula to the last cell.

These things aren’t as easy to do as in Excel—it’s hard to get simpler than “push a button” or “click and drag your mouse”—but they are useful to know in R. H/T R-bloggers

Comments closed