2019-05-07 – Curated SQL

Horizontal Labels with ggplot

Published 2019-05-07 by Kevin Feasel

Michael Toth shows us how to ensure we use horizontal text labels in ggplot:

There are several things we could do to improve this graph, but in this guide let’s focus on rotating the y-axis label. This simple change will make your graph so much better. That way, people won’t have to tilt their heads like me to understand what’s going on in your graph:

It may not seem like much when you’re creating the visual, but it can make a difference for a viewer.

Comments closed

Repeated Cross-Validation in R

Published 2019-05-07 by Kevin Feasel

Ludvig Olsen walks us through a couple of nice R packages:

The benefits of using groupdata2 to create the folds are 1) that it allows us to balance the ratios of our output classes (or simply a categorical column, if we are working with linear regression instead of classification), and 2) that it allows us to keep all observations with a specific ID (e.g. participant/user ID) in the same fold to avoid leakage between the folds.
The benefit of cvms is that it trains all the models and outputs a tibble (data frame) with results, predictions, model coefficients, and other sweet stuff, which is easy to add to a report or do further analyses on. It even allows us to cross-validate multiple model formulas at once to quickly compare them and select the best model.

Ludvig also gives us some examples of how both packages can help you out. H/T R-Bloggers

Comments closed

Using Power Query to Expand Out Missing Dates

Published 2019-05-07 by Kevin Feasel

Matt Allington solves a problem in Power Query:

Suppose you have data in the form of dates (not consecutive) with a value for each of the dates (see the table below left side). You need to expand the rows of the table (create the missing rows) so that you will have all the consecutive dates in the given range and each of the dates has the previous updated value (see the table below right side).

The solution has a pretty large number of steps but is straightforward.

Comments closed

Rewriting Expensive Updates

Published 2019-05-07 by Kevin Feasel

Erik Darling takes us through an experiment:

Let’s also say that bad query is taking part in a modification.
UPDATE u2 SET u2.Reputation *= 2 FROM Users AS u JOIN dbo.Users AS u2 ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0 WHERE u2.Reputation >= 100000; AND u.Id <> u2.Id;
This query will run for so long that we’ll get sick of waiting for it. It’s really holding up writing this blog post.

Erik rewrites this query a couple of times. Click through to learn what he does and why he does it.

Comments closed

Triggers and Multi-Record Changes

Published 2019-05-07 by Kevin Feasel

Brent Ozar points out a common problem with trigger design:

When you declare variables and set them using one row from the INSERTED or DELETED virtual table, you have no idea which row you’re going to get. Even worse, sometimes this trigger will update one row, and sometimes it won’t – because it might happen to grab a row with a reputation under 1,000!

It’s an easy mistake to make and one which can have a major impact.

Comments closed

Enabling Large Memory Pages in SQL Server

Published 2019-05-07 by Kevin Feasel

David Klee talks us through large memory pages:

SQL Server Enterprise Edition can leverage large memory pages to reduce the amount of memory pointers required for larger SQL Server deployments. Reducing the number of pointers makes the database engine more efficient, especially for SQL Servers with greater than 32GB of RAM. A normal memory block is 4KB, and many thousands of pointers are required to manage the memory underneath a larger SQL Server. Large memory pages can change the block size to 2MB, greatly reducing the number of pointers required for memory management.

Read on to see what effect this has, as well as when to use them and—more importantly—when not to use them.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: May 7, 2019

Horizontal Labels with ggplot

Repeated Cross-Validation in R

Using Power Query to Expand Out Missing Dates

Rewriting Expensive Updates

Triggers and Multi-Record Changes

Enabling Large Memory Pages in SQL Server