Press "Enter" to skip to content

Day: May 12, 2023

The Two File Formats for Polyglot Notebooks

Matt Eland chooses a file extension:

I’ve been talking more and more about Polyglot Notebooks and as people try it out, they tend to ask me one common question: should I create a .dib file or an .ipynb file? What’s the difference anyways?

In this short article I’m going to explore the .dib and .ipynb file formats and explain the difference between the two while answering the question of which one you should choose when creating your own notebooks in Polyglot Notebooks.

Read on for Matt’s thoughts. My tendency is to create them in .ipynb format for additional tooling support and potential cross-product flexibility (assuming you have the right kernels installed on your Jupyter server), though Matt explains his preference for .dib.

Comments closed

Feature Branching and Hotfixes for Azure DevOps

Vytas Suopys covers a bit of source control strategy:

Have you ever deployed a release to production only to find out a bug has escaped your testing process and now users are being severely impacted? In this post, I’ll discuss how to deploy a fix from your development Synapse Workspace into a production Synapse Workspace without adversely affecting ongoing development projects.

This example uses Azure DevOps for CICD along with a Synapse extension for Azure DevOps: Synapse Workspace Deployment. In this example, I assume Synapse is already configured for source control with Azure DevOps Git and Build and Release pipelines are already defined in Azure DevOps. Instructions on how to apply this this can be found in the Azure Synapse documentation for continuous integration and delivery.

The specific example covers Synapse, though the general principle applies no matter what you’re deploying.

Comments closed

Implementing Temporal Tables with Existing Data

Matthew McGiffen gets to one of my problems with temporal tables:

I also referred to Temporal Tables which are available to us from SQL Server 2016 onward.

Temporal tables aren’t just about monitoring change, they also provide really nice methods for being able to query historical data – to see what the values were at a particular point in time e.g.

SELECT * FROM dbo.SomeData FOR SYSTEM_TIME AS OF '1900-01-01';

My big problem with temporal tables is that they only implement system-defined times. That’s fine in a quasi-historical OLTP scenario, where you want to track history but only occasionally make use of it. But if they supported application time, then you have the ability to create something akin to a type-2 slowly changing dimension with just a few extra words. I understand that the tricky part is that application-defined temporal tables lose the nicety of knowing that the latest insert always goes into the main table and drives the prior record into the historical table, but there are some clever ways around this problem as well. It’s just too early in the morning for me to articulate them is all…

Comments closed

False Alarms in Highly Available Postgres Clusters

Umair Shahid pulls the alarm:

False alarms can be a significant problem in highly available clusters of PostgreSQL. They can cause unnecessary downtime and disruptions that can impact the performance of the nodes. In this blog post, we will explore the causes, prevention, and resolution of false alarms in PostgreSQL clusters.

It’s a good idea to sit back and think about how complex the problem of high availability is, even if the service (SQL Server, Postgres, or whatever) offers capabilities to simplify a lot of it. The trick is that you want your service to fail over if and only if it needs to, but what tells you if it “needs to” is noisy signal.

Comments closed