Press "Enter" to skip to content

Curated SQL Posts

Bi-Directional Replication in PostgreSQL

Semab Tariq does a bit of replication:

Bi-directional replication, often referred to as multi-master replication, allows data to be written and read from multiple database nodes. This means that changes made on one node are automatically replicated to the other node, and vice versa. This setup ensures that all nodes have the same data, providing high availability and fault tolerance.

This example covers replicating a single table, but the code itself was rather easy to follow. I imagine things get a bit more challenging when you add some of the more common types of complexity to this, including what happens when your app tries to insert the same record on both nodes.

Comments closed

Tips for Developing Good Tutorials

Adron Hall shares some advice on writing a tutorial (or any kind of technical documentation):

Here’s the deal, tutorial writers: lay out ALL the prerequisites clearly at the beginning. And I mean ALL of them. Don’t assume I have jq installed for your GraphQL tutorial. Don’t assume I’m running the latest version of Python (and for the love of code, specify WHICH Python – 2.x or 3.x?).

And here’s a novel idea: how about actually telling me where to find and install these prerequisites? Give me links, give me version numbers, give me command line instructions. Assume I’m starting from scratch on a fresh machine. Because guess what? Sometimes I am!

I think that the set of tips Adron provides is aspirationally solid, meaning that there may be some things you can’t actually do but you should certainly try to do all of them. And I’m certainly not perfect at this.

Comments closed

Listing Columns a DAX Query Touches

Chris Webb is making a list:

Some time ago I wrote a post about how you can use the DISCOVER_CALC_DEPENDENCY DMV to get a list of all the tables, columns and relationships referenced by a DAX query. That’s really useful and indeed the new INFO.CALCDEPENDENCY DAX function allows you to do the same thing without all the complexity of calling a DMV. However this doesn’t quite give you everything you need to solve one important problem: which columns are touched when a DAX query runs? The INFO.CALCDEPENDENCY function gives you a list of all the columns used directly in a DAX query and it also gives you a list of all the columns needed by all the measures used in a DAX query. However it only gives you a list of the relationships used and these relationships also rely on columns. In this post I’ll show you how to get a list of all the columns that are touched by a DAX query.

Click through for the script.

Comments closed

Dynamically Start a Child Pipeline in Fabric Data Factory

Andy Leonard continues a series on Fabric Data Factory design patterns:

In an earlier post titled Fabric Data Factory Design Pattern – Basic Parent-Child, I demonstrated one way to build a basic parent-child design pattern in Fabric Data Factory by calling one pipeline (child) from another pipeline (parent). In a later earlier post titled Fabric Data Factory Design Pattern – Parent-Child with Parameters, I modified the parent and child pipelines to demonstrate passing a parameter value from a parent pipeline when calling a child pipeline that contains a parameter.

In this post, I modify a parent pipeline to explore parameterizing which child pipeline will be called by the parent pipeline. In this post, we will:

  • Copy the child pipeline id
  • Clone a parent pipeline
  • Add and configure a pipeline variable for the child pipeline id
  • Test the dynamic pipeline id

Read on to see how.

Comments closed

Looping through Column Names in R

Steven Sanderson builds a loop:

Looping through column names in R is a crucial technique for data manipulation, especially for beginners. This article will guide you through various methods to loop through column names in R, providing practical examples and insights to enhance your data analysis skills.

Read on for examples with for loops, the dynamic duo of lapply() and sapply(), and the map() function in the purrr library.

Comments closed

Building an Impact Analysis Process

Marc Lelijveld needs more than the minimum impact analysis:

Imagine you have a semantic model in the Power BI Service (or Fabric if you will), and you’re about to make a breaking change to this semantic model. How do you inform your end users? How do you tell them about this change? In this blog I will zoom in to options you have in the interface that will help you to reach out to your users, looking at different aspects from other reports in Power BI, but also more complex the users that connect via Analyze in Excel.

Click through for the use case, why the built-in impact analysis option for Power BI isn’t sufficient, and what you can do to flesh it out.

Comments closed

Tips for Adopting Microsoft Fabric

Paul Turley shares some thoughts:

Hello, friends. I’ve spent the past few months working with several new Fabric customers who were seeking guidance and recommendations for Fabric architecture decisions. What have we learned about using Fabric in enterprise data settings in the past 11 months? This post covers some of the important decisions points and Fabric solution design patterns.

Much of the industry’s experience with Microsoft Fabric over the past several months has been at a high-level as organizations were dipping their toe in the pool to test the water. So far, our Data & AI team have assisted around 50 clients with Fabric projects of various sizes. We have also implemented a handful of production scale projects with enterprise workloads, comparing notes with community leaders and the product teams who develop the product. What lessons have we learned?

Click through for several bits of high-level architectural guidance intended to make that adoption easier.

Comments closed

Working with Parquet Files in Postgres

Craig Kerstiens announces an extension:

Today, we’re excited to release pg_parquet – an open source Postgres extension for working with Parquet files. The extension reads and writes parquet files to local disk or to S3 natively from Postgres. With pg_parquet you’re able to:

  • Export tables or queries from Postgres to Parquet files
  • Ingest data from Parquet files to Postgres
  • Inspect the schema and metadata of existing Parquet files

Code is available at: https://github.com/CrunchyData/pg_parquet/.

Read on for more background on why we built pg_parquet or jump below to get a walkthrough of working with it.

Hey, that’s my job to tell people to read on to learn more!

Comments closed

Avoid Storing Files in Databases

Joey D’Antoni explains why you almost never want to use FILESTREAM:

Yesterday, I accidentally walked into a discussion on LinkedIn about the merits of Filestream in SQL Server. If you aren’t familar with Filestream, consider yourself lucky, it was a feature that was added to SQL Server in the short timeframe that people thought it was a good idea to use databases for file storage, and before the enlightened times when object storage became a thing. I first remember blobs being a thing in Oracle 8i, where at least you had the ability to store them in a tablespace with a larger block size than 8k, and had a dedicated area of the buffer pool with that larger block size that you could dedicate to that blob tablespace.

Joey has the right of things. There are rare exceptions where it could make sense to store files in databases. My best example involves storing ML models that we use in SQL Server ML Services, simply because of how difficult it is to read anything off of disk via ML Services. But that’s a real edge case.

Comments closed