Press "Enter" to skip to content

Month: April 2017

Diagnosing Database Restore Wait Times

Bob Ward notes that the “100 percent processed” message doesn’t mean everything is quite finished yet in a database restoration:

Notice the “100 percent…” message has detailed about “bytes processed”. Since my data is around 13Mb this tells me that the progress indicators are all about the data transferred step of RESTORE. Notice the time gap in the messages to “Waiting for Log zeroing…” and “Log Zeroing is complete”. That gap in time is around 2 minutes. Exactly the time it took between the the 100% Complete message in the SSMS Window and the final restore message.

From this evidence I can conclude that having a transaction log file size that takes a long time to initialize can be a possible cause of seeing the behavior of getting the 100% Complete Message quickly but the overall RESTORE takes longer to complete.

There’s a lot worth reading packed into this post, as you’d expect from Bob.  Read the whole thing.

Comments closed

Fitness In Modeling

Leila Etaati notes the Scylla and Charybdis of models:

However, in the most machine learning experiences, we will face two risks :Over fitting and under fitting.
I will explain these two concepts via an example below.
imagine that we have collected information about the number of coffees that have been purchased in a café from 8am to 5pm.

Overfitting tends to be a bigger problem in my experience, but they’re both dangerous.

Comments closed

Dealing With NULL

Jeff Mlakar has a pair of comparisons for NULL handling, with ISNULL vs COALESCE and CONCAT vs + for concatenation:

We expect this much from IsNull. However, coalesce is a little different. will take the data type from the first non-null value passed and use that for the table definition. This might not always be what you want because if you pass bits you might get integers. If you pass an array of integers and floats you will get numeric. Be aware if this isn’t what you wanted.

Read the whole thing.

Comments closed

Resumable Online Index Rebuilds

Niko Neugebauer introduces a new feature in SQL Server 2017:

After multiple executions, the first process (Resumable Online Index Rebuild) on the average took 65.8 seconds, while the second one (a simple online) took only 60.8 seconds, representing 8% of the improvement of the overall performance. I can’t say if it looks acceptable to you or not, but for me this is something I will be definitely considering to be as an advantage for the cases where the resumable process is needed.

I decided to run a test on much bigger table, the lineitem which for 10GB TPCH database contains 60 Million Rows. My expectation here was to see if the percentage would stay the same or will jump to a whole new level (please make sure that you do execute the following script at least a couple of times, to get the real results and not the results of your disk-drive prefetching :)):

The big table example result was somewhat surprising.  Niko is his normal, informative self, so definitely read the whole thing.

Comments closed

Graph Data In SQL Server

Terry McCann has a first look at SQL Server 2017’s graph data capabilities:

SQL Graph is a similar concept to what is described above, but built in to the core SQL Server engine. This means 2 new table types NODE and EDGE and a few new TSQL functions in particular MATCH(). SQL Graph at the time of writing is only available in SQL 2017 ctp 2.0. You can read more and download ctp2.0 here https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/sql-server-2017-community-technology-preview-2-0-now-available/. Once ctp 2.0 is installed there is nothing else you need to do to enable the new graph syntax and storage.

There is an example you can download from Microsoft which is a similar set up to the example in the image above. However I have used some real data shredded from IMDB the internet movie database. This data is available to download from Kaggle https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

Click through for a video demonstration as well.

Comments closed

Global Trace Flags Are Global

Kendra Little points out that when you turn a trace flag on globally, it cannot be off for individual sessions:

In SQL Server 2016, you can now enable the very same optimizer hotfixes controlled by Trace Flag 4199 at the database scope by using ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES=ON.

If you have the setting configured at the database level, it’s much easier to test what would happen if the setting was NOT enabled, because you can compile your query from a different database.

Interesting results.  Check it out.

Comments closed

SQL Server Graph Database

The SQL Server team announces graph extensions in SQL Server 2017:

Graph extensions are fully integrated in the SQL Server engine. Node and edge tables are just new types of tables in the database. The same storage engine, metadata, query processor, etc., is used to store and query graph data. All security and compliance features are also supported. Other cutting-edge technologies like columnstore, ML using R Services, HA, and more can also be combined with graph capabilities to achieve more. Since graphs are fully integrated in the engine, users can query across their relational and graph data in a single system.

This is interesting.  One concern I have had with graph databases is that graphs are storing the same information as relations but in a manner which requires two distinct constructs (nodes and edges) versus one (relations).  This seems to be a hybrid approach, where the data is stored as a single construct (relations) but additional syntax elements allow you to query the data in a more graph-friendly manner.  I have to wonder how it will perform in a production scenario compared to Neo4j or Giraph.

Comments closed

Upgrading SQL On Linux

Steve Jones has a post on upgrading SQL Server on Linux:

I’m cutting off part of the path, since I think it’s probably NDA. No worries, apparently the old location for me hasn’t been updated with new packages, which makes sense.

I decided to check the MS docs and see how a new user would get SSoL running? At the new docs.microsoft site, I found the Install SQL Server on Ubuntu doc.

Following the instructions, I updated the GPG keys and registered the repository with curl:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server.list | sudo tee /etc/apt/sources.list.d/mssql-server.list

My expectation is that upgrading SQL Server on Linux is going to be a lot less painful than upgrading on Windows.

Comments closed

Comparing Tables With Powershell

Shane O’Neill compares table columns with T-SQL and Powershell:

That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.

There’s syntax for this…PIVOT

This is helpful for normalizing a bunch of wide, related tables into a subclass/superclass pattern.

Comments closed

Logistic Regression In R

Steph Locke has a presentation on performing logistic regression using R:

Logistic regressions are a great tool for predicting outcomes that are categorical. They use a transformation function based on probability to perform a linear regression. This makes them easy to interpret and implement in other systems.

Logistic regressions can be used to perform a classification for things like determining whether someone needs to go for a biopsy. They can also be used for a more nuanced view by using the probabilities of an outcome for thinks like prioritising interventions based on likelihood to default on a loan.

It’s a good introduction to an important statistical method.

Comments closed