Building Test Data Following A Normal Distribution In T-SQL

I (finally) have a technical blog post:

In order to show you the solution, I want to build up a reasonable sized sample.  Any solution looks great when reading five records, but let’s kick that up a notch.  Or, more specifically, a million notches:  I’m going to use a CTE tally table and load 5 million rows.
I want some realistic looking data, so I’ve adapted Dallas Snider’s strategy to build a data set which approximates a normal distribution.
Because this is a little complicated, I wanted to take the time and explain the data load process in detail in its own post, and then apply it in the follow-up post.  We’ll start with a relatively small number of records for this demonstration:  50,000.  The reason is that you can generate 50K records almost instantaneously but once you start getting a couple orders of magnitude larger, things slow down some.

If you do custom data generation for lower environments, I’d recommend checking this out. Your production data probably doesn’t follow a normal distribution exactly, but a normal distribution is probably closer to reality than the uniform distribution you get with functions like RAND().

Related Posts

Pivoting Performance Counter Data

Dave Bland shows how you can build a dynamic pivot to see performance counter data over a stretch of time: The next step is to write the code to capture the counter values and insert the data it the temporary table created above.  Because we need to capture the values over a period of time, […]

Read More

Near-Zero Downtime Identity Column Changes

I’m getting close to the end of my series on near-zero downtime deployments. This latest post involves identity column changes: There are some tables where you create an identity value and expect to cycle through data. An example for this might be a queue table, where the data isn’t expected to live permanently but it […]

Read More


January 2019
« Dec Feb »