Press "Enter" to skip to content

Day: November 9, 2018

Kaggle-Maintained Data

Noah Daniels announces Maintained by Kaggle data sets:

The “Maintained by Kaggle” badge means that Kaggle is now and will continue to actively maintain that dataset. This includes regular updates to descriptions and metadata, quicker response rates in discussion, and accurate current data from the source. Our goal is to create seamless workflows that allow everyone to do data science on Kaggle and be confident in the data they work with.

They have several data sets available from different open data projects for several cities, as well as NOAA and the World Bank.  If you’re looking for data sets to play with, this is a good option.

Comments closed

Faster Scalar Functions In SQL Server 2019

Brent Ozar looks at improvements the SQL Server team has made to scalar functions in 2019:

My database has to be in 2019 compat mode to enable Froid, the function-inlining magic. Run the same query again, and the metrics are wildly different:

  • Runtime: 4 seconds

  • CPU time: 4 seconds

  • Logical reads: 3,247,991 (which still sounds bad, but bear with me)

My bias tells me that I still want to avoid scalar functions, but it’s no longer the automatic deal-killer it once was.

Comments closed

The Basics Of Kubernetes

Chris Adkin gives us a rundown on Kubernetes:

With the announcement of SQL Server 2019 big data clusters at Ignite, Kubernetes (often abbreviated to K8s) now stands front and center as part of Microsoft’s data platform vision. The obvious inference being that this is something that the Microsoft data platform community is going to show an increased interest in. The post aims to provide some context around:

  • why container orchestration is required

  • how Kubernetes is architected

  • the basics of working with Kubernetes

  • and why embracing open source software should be approached in an eyes wide open manner

Kubernetes is another technology which is useful to learn and can be helpful down the line.

Comments closed

The Table Spool Operator In SQL Server

Hugo Kornelis digs into table spools:

The Table Spool operator is one of the four spool operators that SQL Server supports. It retains a copy of all data it reads in a worktable (in tempdb) and can then later return extra copies of these rows without having to call its child operators to produce them again. These copies can be made available in the same part of the execution plans, or in another part.

Table Spool is probably the most basic of the spool operators. The Index Spool operator is very similar to it, but indexes its data to allow it to return only a subset of the stored rows. The Row Count Spool operator is optimized for specific cases where the rows to be returned are empty. And the Window Spool operator is used to support the ROWS and RANGE specifications of windowing functions.

Typical use cases of a Table Spool are: to reproduce the same input multiple times without having to re-execute its child nodes (e.g. in the inner input of a Nested Loops); to make the same input available in multiple branches of an execution plan (e.g. in wide update plans); or to ensure that an original copy of the data is available after an insert, update, or delete operator changes the base data (“Halloween protection”).

Click through for a great deal more detail.

Comments closed

Accelerated Database Recovery In SQL Server 2019

Frank Gill notes an exciting new feature in SQL Server 2019:

“Any sufficiently advanced technology is indistinguishable from magic.” -Arthur C. Clarke

In this morning’s keynote session at PASS Summit 2018, public preview of a new feature in Azure SQL Database and SQL Server 2019 called Accelerated Database Recovery (ADR) was announced.  This changes the way that SQL Server handles recovery of a SQL Server instance on start up.

This looks really good for large databases, where recovery can sometimes be measured in hours.

Comments closed

Azure Data Studio November Release

Alan Yu announces this month’s Azure Data Studio update:

In November’s version of the monthly release blog, the emphasis was on fixing customer issues and adding and improving existing extensions.

This includes:

Read on for the details.  This product is getting closer and closer to a state where it can be a daily driver.

Comments closed