Press "Enter" to skip to content

Day: February 5, 2020

Fun with Metaphors: Data Lakehouses

Ben Lorica, et al, have a new metaphor to try out:

Over the past few years at Databricks, we’ve seen a new data management paradigm that emerged independently across many customers and use cases: the lakehouse. In this post we describe this new paradigm and its advantages over previous approaches.

The Data Lake’s Aristotelian counterpart is the Data Swamp. I’m working on a similar comp for the Data Lakehouse (Data Swampboat? Data Swamphouse is too easy), but in the meantime, that one person who goes and slaughters your application’s performance by butchering the data in your Data Lakehouse? That’s a Data Jason.

1 Comment

Quick Hits on Azure Databricks Performance

Rayis Imayev has a few thoughts on optimizing delta table-based workloads in Azure Databricks:

2) Enable the Delta cache – spark.databricks.io.cache.enabledtrue
There is a very good resource available on configuring this Spark config setting: https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/delta-cache

And this will be very helpful in your Databricks notebook’s queries when you try to access a similar dataset multiple times. Once you read this dataset for the first time, Spark places it into internal local storage cache and will speed up the process of further referencing it for you.

Click through for several more along these lines.

Comments closed

Cleaning Up Schema Ownership

Pamela Mooney doesn’t like user-owned schemas:

My colleagues and I take schema ownership seriously.  The owner (with few exceptions) should always be “dbo”.  Certainly, it should not be a user.  Why?  Because if the user leaves and their account is disabled or deleted, we have a problem.  If you’re a DBA, you have enough problems without adding this one to your list. 

So, how do you find these offenders, much less fix them?

That’s what you’ll find out, but only if you click through.

Comments closed

Showing a Calendar in your Powershell Prompt

Jeffrey Hicks has fun with calendars in Powershell:

Some of you may be aware of my PSCalendar module which you can install from the PowerShell Gallery. The module contains commands that you can use to display a console-based calendar.  The calendar commands let you specify days to highlight. These might be days with special events or appointments. I typically use the Show-Calendar command as it writes to the host and colorizes output.

This command also has a parameter that lets you specify a position in your console. In other words, you can tell PowerShell where to display the calendar. I recently fixed a bug with the command that was producing less than optimal results. Now, I can use my PowerShell prompt function to display a calendar. 

The calendar module and functions are quite helpful, and the calendar prompt merits the Wacky Ideas category.

Comments closed

When Identity Columns Skip Values

Kevin Hill explains why you might see SQL Server skip 1000 values in an identity column:

Video shows a walk-through of before and after each fix, plus a “Two guys walk into a bar” joke when I disappeared to troubleshoot a broken demo…

I’d probably avoid the fix and live with gaps. You also get gaps when you roll back an operation which inserted into an identity column, or if you have merge replication enabled on a table keyed by an identity column and SQL Server bumps the range on you. All of these are normal and good reasons not to expect contiguous numbering.

1 Comment

Using Windows Subsystem for Linux 2 in Windows 10

Max Trinidad is excited about Windows Subsystem for Linux 2:

First, I love WSL (Windows Subsystem for Linux)! It’s a great addition to Windows 10, and everyone should learn how to use it.

To get started, follow the instructions on how to get your WSL 1 Linux Distro installed. And, begin with installing Ubuntu 18.04.

Now, get Docker Desktop (), which can be installed in Windows 10 RTM Build 18363 with WSL 1.

One of the key benefits around WSL 2 is that your Docker containers will run natively rather than through a VM. That’s a pretty big deal in terms of performance and production-readiness. That Docker capability is currently in preview, but I’d expect it to make its way to production sooner than later.

Comments closed

Azure SQL Hyperscale Auto-Scaling

Davide Mauri explains how automatically to scale Azure SQL Hyperscale:

Azure SQL Hyperscale is the latest architectural evolution of Azure SQL, that has been natively designed to take advantage of the cloud. One of the main key features of this new architecture is the complete separation of Compute Nodes and Storage Nodes. This allow for independent scale of each service, making Hyperscale more flexible and elastic.

In this article I will describe how it is possible to implement a solution to automatically scale your Azure SQL Hyperscale database up or down, to dynamically and automatically adapt to different workload levels without the requiring manual .

Davide has some test measures of how much downtime you see and give you a couple thoughts on how you can track when it’s time to scale up or down.

Comments closed

Tracking Query Store Changes

Erin Stellato shows how to watch for Query Store changes whether due to settings modifications or running out of space:

The Query Store feature is a bit unique in that its status can change without user interference, which means it is important to understand how changes to Query Store are logged.  A couple weeks ago John Deardurff posted a question on Twitter asking specifically whether the SQL Server ERRORLOG is written to when the OPERATION_MODE changes to READ_ONLY because  MAX_STORAGE_SIZE_MB is exceeded.  I had never tested to confirm, but I know there is an event in Extended Events that will fire when the limit is reached.  I also know that when a user makes a change to a Query Store setting, it is logged in the ERRORLOG.

Click through to see how to watch for this and what the changes look like.

Comments closed