Press "Enter" to skip to content

Curated SQL Posts

Creating Indexed Views

Eduardo Pivaral shows how to create a fairly simple indexed view:

Views help our query writing by simplifying writing the same sentences and/or aggregations over and over again, but it has a drawback, the views just store our query definition, but the performance is not improved by using them.

Since SQL Server 2008, the option to create an index over a view was introduced, of course, there are some limitations, but if your view can use them, the performance improvement could be great!

I will show you how to create a simple index over a view

I’ve used indexed views in the past, but they’re a much less common tool in the belt these days.

Comments closed

Finding The Last Database Restore Time

Lori Brown shows how you can see the last time a particular database was restored:

I have a client that uses a lot of disconnected log shipping on a few servers.  They do this because they want to have a copy of their database that is actually hosted by software vendors for reporting purposes.  So, disconnected log shipping it is!  We pull down files that have been FTP’d to the corporate FTP site and apply those logs locally and leave the databases in standby so that they can be queried.

(If you want to review how to set up disconnected log shipping complete with FTP download scripts, all SQL jobs and log shipping monitoring, then hop on over to https://www.sqlrx.com/setting-up-disconnected-log-shipping/ to check it out!)

Of course every so often there is a glitch and one or all databases can get out of synch pretty easily.  When we receive a notification that tlogs have not been restored in a while, the hunt is on to see what happened and get it corrected.  Since I have disconnected log shipping set up, I can’t get a log shipping report from SSMS that will tell me errors and what was the last log applied.  So, I made a query that will give me the most recent file that was restored along with the file name, date, database, etc.  I can pull back all databases or by uncommenting a line can filter by a single or multiple databases.

Lori also includes a helpful script.

Comments closed

What Happens With Data Compression + Backup Compression

Jess Pomfret tests what happens when you enable backup compression for databases with already-compressed tables in SQL Server:

What happens if I use data compression and backup compression, do I get double compression?

This is a great question, and without diving too deeply into how backup compression works I’m going to do a simple experiment on the WideWorldImporters database.  I’ve restored this database to my local SQL Server 2016 instance and I’m simply going to back it up several times under different conditions.

After restoring the database it’s about 3GB in size, so our testing will be on a reasonably small database.  It would be interesting to see how the results change as the database size increases, perhaps a future blog post.

Click through for the answer.

Comments closed

No Curation Today

I’m somewhere deep in South Dakota at this point, so no curation today.  We will be back on the regular schedule Monday.

South Dakota at dusk.
Comments closed

Last-Click Attribution With Databricks Delta

Caryl Yuhas and Denny Lee give us an example of building a last-click digital marketing attribution model with Databricks Delta:

The first thing we will need to do is to establish the impression and conversion data streams.   The impression data stream provides us a real-time view of the attributes associated with those customers who were served the digital ad (impression) while the conversion stream denotes customers who have performed an action (e.g. click the ad, purchased an item, etc.) based on that ad.

With Structured Streaming in Databricks, you can quickly plug into the stream as Databricks supports direct connectivity to Kafka (Apache KafkaApache Kafka on AWSApache Kafka on HDInsight) and Kinesis as noted in the following code snippet (this is for impressions, repeat this step for conversions)

This is definitely an interesting approach to the problem.  Check it out.

Comments closed

Bayesian Neural Networks

Yoel Zeldes thinks about neural networks from a different perspective:

The term logP(w), which represents our prior, acts as a regularization term. Choosing a Gaussian distribution with mean 0 as the prior, you’ll get the mathematical equivalence of L2 regularization.

Now that we start thinking about neural networks as probabilistic creatures, we can let the fun begin. For start, who says we have to output one set of weights at the end of the training process? What if instead of learning the model’s weights, we learn a distribution over the weights? This will allow us to estimate uncertainty over the weights. So how do we do that?

It’s an interesting approach to the problem.

Comments closed

Microsoft R Open 3.5.1

David Smith announces Microsoft R Open 3.5.1:

Microsoft R Open 3.5.1 has been released, combining the latest R language engine with multi-processor performance and tools for managing R packages reproducibly. You can download Microsoft R Open 3.5.1 for Windows, Mac and Linux from MRAN now. Microsoft R Open is 100% compatible with all R scripts and packages, and works with all your favorite R interfaces and development environments.

This update brings a number of minor fixes to the R language engine from the R core team. It also makes available a host of new R packages contributed by the community, including packages for downloading financial data, connecting with analytics systems, applying machine learning algorithms and statistical models, and many more. New R packages are released every day, and you can access packages released after the 1 August 2018 CRAN snapshot used by MRO 3.5.1 using the checkpoint package.

Read on for more and check out the updates.

Comments closed

Dealing With CheckDB Error Message 824 Level 24

Steve Stedman has a post on fixing a database which has experienced an incorrect pageid error:

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:2806320; actual 0:0).  It occurred during a read of page (1:xxxxx) in database ID 5 at offset 0x00000xxxxx0000 in file ‘C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\YourDatabaseName.mdf’.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Since this is one of those things that I regularly work with, I thought I would see what other people are saying about this error message, and boy oh boy did I found some crazy and outright damaging suggestions

Steve puts together a bunch of really bad advice and explains why you shouldn’t follow it.  Read the whole thing and listen to Steve’s advice, not the bad advice.

Comments closed

Getting An Accurate Query Execution Time

Grant Fritchey shares some tips on accurate query time estimation:

Before we get into all the choices and compare them, let’s baseline on methodology and a query to use.

Not sure why, but many people give me blow back when I say “on average, this query runs in X amount of time.” The feedback goes “You can’t say that. What if it was just blocking or resources or…” I get it. Run a query one time, change something, run that query again, declare the problem solved, is not what I’m suggesting. Notice the key word and trick phrase “on average.” I don’t run the query once. I run it several times, capture them all, then get the average of the durations.

The observer effect is in full force with a couple of the techniques Grant shows, but the rest are generally stable, which is a good thing.

Comments closed

Performing Linear Regression With Power BI

Jason Cantrell shows how to create a simple linear regression in Power BI:

Linear Regression is a very useful statistical tool that helps us understand the relationship between variables and the effects they have on each other. It can be used across many industries in a variety of ways – from spurring value to gaining customer insight – to benefit business.

The Simple Linear Regression model allows us to summarize and examine relationships between two variables. It uses a single independent variable and a single dependent variable and finds a linear function that predicts the dependent variable values as a function of the independent variables.

If you want real linear regression, drop in an R or Python script.

Comments closed