Day: October 13, 2020

Sliding Windows in R

Bryan Shalloway shows off some new functionality in the rsample package:

For some problems you may want to take a traditional regression or classification based approach while still accounting for the date/time-sensitive components of your data. In this post I will use the tidymodels suite of packages to:

– build lag based and non-lag based features
– set-up appropriate time series cross-validation windows
– evaluate performance of linear regression and random forest models on a regression problem

For my example I will use data from Wake County food inspections. I will try to predict the SCORE for upcoming restaurant food inspections.

Click through to see it in action.

Gussying Up R Tables in GitHub

Laura Ellis solves a problem:

One thing I love about performing analysis in .Rmd (R Markdown) files is how easy it is to share your results publicly on GitHub. Create your analysis in the .Rmd file, set your output variant as below, knit to .md format and then add your files to GitHub!

There is only one problem with the .md output: PRETTY TABLES! Most of the pretty tables packages that I like to use, or don’t display all of the formatting, or don’t display at all in .md format.

Click through to see how to solve this, including demonstration videos.

Set Functions in DAX

Marco Russo and Alberto Ferrari walk us through three important set functions in DAX:

In this article we refer to “set functions” as functions that operate on sets. The three set functions available in DAX are: UNIONINTERSECT, and EXCEPT. Their behavior is very intuitive:

UNION performs the union of two or more tables.
INTERSECT performs the set intersection between two tables.
EXCEPT removes the rows of the second argument from the first one.

These functions take two or more tables as parameters and return a table. They prove useful not only to write DAX queries; a developer can also use these functions to prepare complex filters when implementing measures.

Read on to see how these work in DAX.

Diagnosing and Solving tempdb Bottlenecks

Ameena Lalani shares some good info on the tempdb database:

Tempdb is a SQL Server temporary system database. Everytime SQL Server restarts, new Tempdb database is created. Whatever data was stored in the tempdb is lost. When TempDB gets recreated its data and log files reset to their last configured size. SQL Server records only enough information in the tempdb transaction log to roll back a transaction, but not to redo transactions during database recovery. This feature increases the performance of INSERT statements in tempdb. Tempdb database is always in Simple recovery mode. If you try to change it to Full Recovery mode, you will receive the following error message.

Click through for more info on how the database is special, types of issues you can run into as concurrency grows, and ways to resolve those issues.

Finding the Most Costly Statement in a Stored Procedure

Grant Fritchey takes us through one method of figuring out what which statement you’re waiting to finish when running a stored procedure:

A lot of stored procedures have multiple statements and determining the most costly statement in a given proc is a very common task. After all, you want to focus your time and efforts on fixing the things that cause you the most pain. You simply don’t have the time to tune every single statement in every single procedure. So, identifying the most costly statement is vital.

Happily, Extended Events are here to help.

Click through to see how you can use extended events to figure this out.

Downloading Power BI Reports from a Workspace

Shabnam Watson has a helpful script for us:

You can use PowerShell to download all of your PBI reports in a workspace all at once without having to go through the PBI service UI one at a time. As an added bonus, you may notice that downloading a report with PowerShell is faster that downloading it through the PBI Service UI.

This script is useful for admins to take backups of reports deployed to PBI Service. It can be easily extended to loop through all/several workspaces. It is also useful for developers to take a backup of their report before publishing a new version.

Click through for the script.


Erik Darling has started a new series:

If you go to open a new tab in SSMS and you hear a creaking sound, it might be time to revisit some of your hardware choices.

But that’s a problem for future you. Your problem is right now.

– How to check it: sp_PressureDetector
– What to look for: Stuck queries, Resource Usage, Wait Stats

By default, it’ll look at both CPU and Memory counters. If you don’t know what’s going on, just hit F5.

Read on to see what it looks like for a server hitting memory limits and for a server hitting CPU limits.

Visualizing Parallelism in Power Query Diagnostics Data

Chris Webb wants to track query concurrency when loading data into Power BI:

Most of the time I’ve spent looking at Power Query Query Diagnostics data, I’ve been looking at data for a single query. Over the past few days though I’ve spent some time investigating what this data shows for all the queries that are executed for a single dataset refresh. To help me do this I wrote the following M function:

Click through for the function, as well as ways of visualizing the results.

