Press "Enter" to skip to content

Author: Kevin Feasel

Partition Switching of Staging Data

Aaron Bertrand shares a technique to make table refreshes easier for end users:

So, what is a staging table in SQL? A staging table can be more easily understood using a real-world example: Let’s say you have a table full of vegetables you’re selling at the local farmer’s market. As your vegetables sell and you bring in new inventory:

– When you bring a load of new vegetables, it’s going to take you 20 minutes to clear off the table and replace the remaining stock with the newer product.

– You don’t want customers to sit there and wait 20 minutes for the switch to happen, since most will get their vegetables elsewhere.

Now, what if you had a second empty table where you load the new vegetables, and while you’re doing that, customers can still buy the older vegetables from the first table? (Let’s pretend it’s not because the older vegetables went bad or are otherwise less desirable.)

Read on for some techniques Aaron used for a long time and why he switched to partition switching.

Comments closed

Power BI 101

Soheil Bakhshi is starting some 101-level training on Power BI:

Many people talk about Power BI, its benefits and common challenges, and many more want to learn Power BI, which is excellent indeed. But there are many misconceptions and misunderstandings amongst the people who think they know Power BI. In my opinion, it is a significant risk in using tools without knowing them, and using the technology is no different. The situation is even worse when people who must know the technology well don’t know it, but they think they do. These people are potential risks to the businesses that want to adopt Power BI as their primary analytical solution across the organisation. As a part of my day-to-day job, I communicate with many people interacting with Power BI. Amongst many knowledgeable users are some of those who confuse things pretty frequently, which indicates a lack of understanding of the basic concepts.
So I decided to write a series of Power BI 101 to explain the basics of the technology that we all love in simple language. Regardless of your usage of Power BI, I endeavour to help you know what to expect from Power BI. This is the first part of this series.

Read on for the start of this series, asking the question “What is Power BI?”

Comments closed

Database Deployment with External References

Sebastian Meine and Liz Baron try to untangle the Gordian knot:

Most database developers are dealing with databases that contain external references. Even if the database code is in source control, these external references can make it very difficult to deploy to new environments. In these multi-database environments, tools like SQLCompare and SQL Change Automation do not automatically resolve object-order across databases, resulting in errors during deployment.

One way to tackle this, which works especially well for CI pipelines, is to create facades for all externally referenced databases. A facade in this context is a database with the expected name, with the expected objects, but those objects are hollowed out and do not contain any dependencies. You can compare this concept to an interface in an object-oriented language. Once you have these facades, they can be used in a pre-deployment step, simplifying the rest of the deployment by effectively removing object-order dependencies with these external databases.

This is one of the most painful parts of converting existing databases into model-driven database development. Especially once you start having to deal with cross-dependencies and rapidly-changing databases.

Comments closed

Calculating Lead Time from Jira and GitHub

Maria Zakourdaev wants to measure agility:

Do you want to visualize your RnD team performance to drive business value? Is there anything that is slowing down your development pipeline? How agile is your team? How long are your customers waiting for the features?

There are many things that can hold you back. Backlog management, code review delays, resources provisioning, manual testing and deployment automation efficiency. In this article I will show you my method of measuring one of the metrics described in this book called LeadTime.

Read on to see how you can do this.

Comments closed

What is Parquet and Why Use It?

The folks at Jumping Rivers explain what the Parquet file format is and how you can use it in R:

Apache Parquet is a popular column storage file format used by Hadoop systems, such as Pig, Spark, and Hive. The file format is language independent and has a binary representation. Parquet is used to efficiently store large data sets and has the extension .parquet. This blog post aims to understand how parquet works and the tricks it uses to efficiently store data.

Read on for that explanation and plenty of sample code.

Comments closed

Error Handling in Scala

Ashish Chaudhary gives us three different ways of handling errors in Scala:

Error handling is the process of handling the possibility of failure. For example, failing to read a file and then continuing to use that bad input would clearly be problematic. Noticing and explicitly managing these errors saves the rest of the program from various pitfalls.

Exceptions in Scala work the same way as in C++ or Java. When an exception occurs, say an Arithmetic Exception then the current operation is aborted, and the runtime system looks for an exception handler that can accept an Arithmetic Exception. Control resumes with the innermost such handler. If no such handler exists, the program terminates.

Or, another way to put it is, structural programming with try/catch or functional programming via monads (Option and Either).

Comments closed

Tips on using Included Columns on Indexes

Chad Callihan shares some advice:

In my previous blog post, we saw how using INCLUDE to add a column to an index can make a difference compared to a key column. Let’s do a quick overview of INCLUDE and when it should be used.

Included columns are columns that can added to an index as non-key columns. They are only added to the leaf nodes of an index and have a bit more flexibility. Having trouble adding a particular data type to an index? Included columns can be data types unable to be added as key columns. Are you possibly maxed out on index key columns? Use INCLUDE to add any necessary columns.

Read on for an example and note the warning that you shouldn’t just add all of the columns to the INCLUDE clause.

Comments closed

Unique Constraints vs Unique Indexes

Erik Darling calls out unique key constraints:

I do love appropriately applied uniqueness. It can be helpful not just for keeping bad data out, but also help the optimizer reason about how many rows might qualify when you join or filter on that data.

The thing is, I disagree a little bit with how most people set them up, which is by creating a unique constraint.

Data modeling Kevin wants to use unique key constraints because that’s the correct thing to do. Implementation Kevin uses unique nonclustered indexes for the reasons Erik describes. Not mentioned in Erik’s post but potentially relevant is that operations on unique nonclustered indexes can be done online, whereas unique key constraint operations (creation and alteration via drop+create) are offline.

Comments closed

The Alt Key in SSMS

Barney Lawrence shows off vertical selection in SQL Server Management Studio:

I went years without knowing how useful the Alt key was, it’s not well documented but I can guarantee that once you know about it and give it a little practice you won’t be able to live without it.

While I’m filing this under SSMS Tips and Tricks but it works equally in Visual Studio, VS Code, Azure Data Studio and even Notepad ++ (but not plain old vanilla notepad). It’s worth a test in other places too.

As a quick note, it does not work quite the same way in Visual Studio Code or Azure Data Studio as in Visual Studio or SSMS: Alt+Shift+Down arrow copies the current row and pastes it in the row below. Holding Alt and Shift while using the mouse does work, but if you’re big on keyboard commands, you’ll be a bit disappointed.

Comments closed

Create and Update Columns on Tables

Kenneth Fisher adds four columns to a table:

One of the easiest ways to collect information about table activity is to add a series of audit columns to the table. The most common set of column consists of four columns.

– When was the row created?

– Who created it?

– When was the row last updated?

– Who last updated it?

Read on to see how to create these, but definitely read the comments. Joe Celko has a good reason to avoid this style, and there’s another good reason as well: if you update the columns and the updated values are longer than what they replace, you can end up generating a lot of page splits as full pages get broken up to handle those in-page updates.

If you do need something like this, I’m growing more and more fond of an activity log table which is append-only and tracks these operations separately. That removes the page splits, allows you to deal with deletions, and gives a better idea of what happened over time for a third party. It does come at a cost if you frequently need to show the create and last updated dates (or users), but my experience has been that a vast majority of the time, we include those for internal purposes, not to display to users.

Comments closed