Press "Enter" to skip to content

Curated SQL Posts

Sampling the Population of SQL Server Instances

Brent Ozar has a breakdown of SQL Server adoption by version:

Ever wonder how fast people are adopting new versions of SQL Server, or what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out in the summer 2020 version of our SQL ConstantCare® population report.

Out of the 3,650 servers sending in data recently, the most popular version of SQL Server is still 2016. 

It’s an interesting survey, but it’s important to remember with any survey what the biases are before drawing conclusions on the general population of all SQL Server instances. This is among ConstantCare users, so if I had a hunch about them, I’d say they’re going to blow out the middle versus gen pop—in other words, I’d expect 2019 and 2017 usage to be a bit higher across the world (and not just among ConstantCare users), but also 2005 (not pictured), 2008, and 2008 R2 to be higher. My hunch here is that the former are shops which have their act together and figure that they don’t need ConstantCare, and the latter are all of those “hidden” instances of SQL Server acting as the back end for small business websites and the like, where nobody even thinks about having a database administrator, much less paying for tooling. Nonetheless, these are interesting trends and even if I think the general population might look different, my data-less guess is that these numbers look reasonable.

Comments closed

Ignoring Bad Dates when Moving to Spark 3

Robert Blackburn shows us one way to handle bad dates when moving to Spark 3:

Moving from a Spark 2 to a Spark 3 runtime has a lot of benefits including big performance improvements through adaptive query executiondynamic partition pruning, and other optimizations. Some updates may require you to refactor your code. One of them is Delta tables now use the Proleptic Gregorian Calendar. Isn’t a calendar a calendar? Unfortunately, no. The Julian calendar has discrepancies with old dates. Specifically dates before 1582 and timestamps before 1900. Here we will dynamically update these dates for incoming source files.

If you would like to follow along in detail, I have a sample notebook that uses the community edition of Databricks. The DBC Archive file is here and the source file is here.

Fortunately, this change is unlikely to affect most of us, with perhaps the most common issue being that you used 0001-01-01 as a default date.

Comments closed

Using Notebooks to Load Data into the Databricks File System

Tomaz Kastrun is putting together an Advent of Azure Databricks:

Yesterday we started working towards data import and how to use drop zone to import data to DBFS. We have also created our first Notebook and this is where I would like to start today. With a light introduction to notebooks.

Read on for a depiction of notebooks, as well as an example which loads data into the Databricks File System (DBFS).

Comments closed

Moving Power BI Dataflows Across Workspaces

Mark Lelijveld has updated a script for us:

Over a year ago, I wrote a blog about moving dataflows across workspaces using a PowerShell script. Especially useful if you want to move dataflow logic from your development to test, acceptance or production workspace.

I received a bunch of feedback on this script and run into some issues myself as well lately. It was about time for an update of the script! Below I share the issues that are addressed in this new version and what new additions are added to the script.

Click through for details on the update.

Comments closed

Batch Mode with Temp Tables

Erik Darling continues receiving big paydays from Big Temp Table:

When you have queries that need to process a lot of data, and probably do some aggregations over that lot-of-data, batch mode is usually the thing you want.

Originally introduced to accompany column store indexes, it works by allowing CPUs to apply instructions to up to 900 rows at a time.

It’s a great thing to have in your corner when you’re tuning queries that do a lot of work, especially if you find yourself dealing with pesky parallel exchanges.

Read on to see how you can create a temp table which triggers batch mode processing fairly easily.

Comments closed

Creating an Azure Purview Catalog Instance

Wolfgang Strasser wants to try out Azure Purview:

Basics – Resource group, purview account name (this cannot be changed afterwards) and the location.

As of today (2020-12-06), there are only 5 Azure regions you can choose from to store the Purview metadata. But – in-region scanning from 16 other Azure regions is available in the preview (source)

This is part one of a multi-part series, so stay tuned for more.

Comments closed

BULK INSERT and Advent of Code

Thomas Rushton performs BULK INSERT I imagine the way most of us do, through trial and lots of error:

Step one in solving these problems in SQL Server is loading the data into a database so you can do something with it.

I chose to use BULK INSERT, because, according to the documentation, it:

Imports a data file into a database table or view in a user-specified format in SQL Server

Ideal, right?

Click through for several tips around BULK INSERT.

Comments closed

Joining Data Streams in Flink

Kundan Kumarr crosses the streams:

Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like UnionJoin, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window.

Click through for an example of the fluent API approach. It’s not as nice as proper SQL, but it does the job.

Comments closed