Press "Enter" to skip to content

Author: Kevin Feasel

More On DBCC CLONEDATABASE

Melissa Connors digs into the DBCC CLONEDATABASE command:

Finally, keep in mind that because the clone is a read-only, empty database, you should be able to test repeatedly without updating statistics and skewing your results. Since I wanted to see this for myself, I executed a set of updates and selects against the SQLSentryData and SQLSentryDataClone databases. As a result of the lack of data and read-only database status, there were no actual updates in SQLSentryDataClone. Consequently, the statistics were updated in the SQLSentryData database, but remained the same in the SQLSentryDataClone database:

Read the whole thing.

Comments closed

Indexes And Stats

Brent Ozar looks at a case when adding a suggested index monkeys with stats:

The query runs faster, make no mistake – but check out the estimates:

  • Estimated number of rows = 1
  • Actual number of rows = 165,367

Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?

Based on this result, there might be further optimizations available.  Read on for more of Brent’s thoughts.

Comments closed

Documenting Replication Using Powershell

Shane O’Neill learned a bit of Powershell in the process of documenting replication:

Now, due to an unfortunate incident when I was a Software Support Engineer that involved a 3 week old backup and a production database, I prefer to not to use the GUI if I can help it.

I’m not joking about that as well, if there is ANY way that I can accomplish something with scripts instead of the GUI, I will take it!

Especially when the need was to document the properties of over 100 articles, I was particularly not looking forward to opening the article properties window for each of the articles and copying them out individually.

Check it out.

Comments closed

Use Source Control

James Anderson wants you to use source control:

SSC and SSDT require the use of compare tools to build deployment scripts. This is referred to as a state based migration. I’d done deployments like this in the past and saw that people reviewing the release found it difficult to review these scripts when the changes were more than trivial. For this reason, I decided to look at some migration based solutions. Migration solutions generate scripts during the development process that will be used to deploy changes to production. This allows the developer to break the changes down into small manageable individual scripts which in turn makes code reviews easier and deployments feel controlled. These scripts sit in the VS project and are therefore source controlled in the same way as the database.

James recommends Git here.  I’m not Git’s biggest fan, but it’s much, much better than not having any source control at all.

Comments closed

The Basics Of Notebooks

I have a quick walkthrough of notebooks:

Remember chemistry class in high school or college?  You might remember having to keep a lab notebook for your experiments.  The purpose of this notebook was two-fold:  first, so you could remember what you did and why you did each step; second, so others could repeat what you did.  A well-done lab notebook has all you need to replicate an experiment, and independent replication is a huge part of what makes hard sciences “hard.”

Take that concept and apply it to statistical analysis of data, and you get the type of notebook I’m talking about here.  You start with a data set, perform cleansing activities, potentially prune elements (e.g., getting rid of rows with missing values), calculate descriptive statistics, and apply models to the data set.

I didn’t realize just how useful notebooks were until I started using them regularly.

Comments closed

Going From Pig To Spark

Philippe de Cuzey introduces Spark to people already familiar with Pig:

I like to think of Pig as a high-level Map/Reduce commands pipeline. As a former SQL programmer, I find it quite intuitive, and at my organization our Hadoop jobs are still mostly developed in Pig.

Pig has a lot of qualities: it is stable, scales very well, and integrates natively with the Hive metastore HCatalog. By describing each step atomically, it minimizes conceptual bugs that you often find in complicated SQL code.

But sometimes, Pig has some limitations that makes it a poor programming paradigm to fit your needs.

Philippe includes a couple of examples in Pig, PySpark, and SparkSQL.  Even if you aren’t familiar with Pig, this is a good article to help familiarize yourself with Spark.

Comments closed

Comparing Nullable Columns

Daniel Hutmacher shows an elegant way to compare multiple nullable columns on two tables:

Because we’ve added OR conditions into the mix, we’re forced to use the Nested Loop join, which loops over table B for every single row in A. That’s a lot of index scans and it comes with a hefty price tag.

Here’s an absolutely eye-watering beautiful pattern that I found on the Interwebs (though I forgot where) the other day.

This is an interesting use of INTERSECT.  Check it out.

Comments closed