Press "Enter" to skip to content

Curated SQL Posts

API Testing with pytest

Xuan Nguyen Truong writes some tests:

API testing is an essential aspect of software development, ensuring that your application’s endpoints are functioning correctly and reliably. In this guide, we’ll introduce you to implement API testing in Python with Pytest and the Requests library.

I’m a big fan of pytest, as it makes testing in Python so much easier. There’s not a lot of ceremony involved in writing tests and it’s easy to see what’s failing during tests.

Comments closed

Postgres Synchronous Replication Guarantees

Kaarel Moppel has a public service announcement:

At last week’s local Postgres user group meetup here in Estonia, one of the topics was HA and recent Patroni (the most popular cluster manager for Postgres) improvements in supporting quorum commit, which by the way on its own has been possible to use for years. Things went deep quickly and we learned quite a bit of course. Including a good reminder that you shouldn’t build your bank on Patroni’s default synchronous mode 🙂

Anyways, during the hallway track (which sometimes are as valuable as the real ones) got an interesting question – with some 3+ quorum nodes, is Postgres then 100% bulletproof against all kinds failures? Excluding meteorites, rouge DBAs and such of course. One could think so, right? Nope.

Read on to learn what might cause failure in that scenario. Guaranteeing synchronous replication between machines over a network is a surprisingly difficult challenge.

Comments closed

Constraints on Polymorphic Associations in SQL Server

Jared Westover wants a foreign key, but one referencing multiple tables:

Do you like a challenge? If you answered yes, you’re my kind of person. Recently, a developer presented me with a problem: they needed a foreign key reference in one table to associate with multiple other tables. Over the years, I’ve often been asked how to make this situation work. However, achieving this relationship with foreign keys is technically impossible with SQL Server and most mainstream relational database platforms. Since SQL Server restricts foreign keys to referencing a single table, how can we solve this problem?

My immediate answer was “triggers,” which happens to be the solution Jared intentionally omits.

I’d rather go with the multiple association tables approach over multiple indicator types, as the latter requires (n-1) NULLs, where n is the number of indicator types (review types in Jared’s example) and I hate NULL because NULL is the void lying about being a value, sort of like how skim milk is water lying about being milk.

Comments closed

Refreshing a Power BI Semantic Model via Eventstreams

Chris Webb builds a Rube Goldberg device:

Following on from my last post where I showed how to send data from Power Automate to a Fabric Eventstream, in this post I’m going to show how to use it to solve one of my favourite problems: refreshing a Power BI semantic model that uses an Excel workbook stored in OneDrive as a source when that Excel workbook is modified.

Now before I go on I want to be clear that I know this is a ridiculously over-engineered and expensive (in terms of CUs) solution, and that you can do almost the same thing just using Power Automate or in several other different ways – see my colleague Mark Pryce-Maher’s recent videos on using Fabric Open Mirroring with Excel for example. I’m doing this to teach myself Fabric Eventstreams and Activator and see what’s possible with them. Please excuse any mistakes or bad practices.

Click through for the process.

Comments closed

Finding Columns in R with No Data

Steven Sanderson looks for the missing columns:

When working with real-world datasets in R, it’s common to encounter missing values, often represented as NA. These missing values can impact the quality and reliability of your analyses. One important step in data preprocessing is identifying columns that consist entirely of missing values. By detecting these columns, you can decide whether to remove them or take appropriate action based on your specific use case. In this article, we’ll explore how to find columns with all missing values using base R functions.

Click through to see how you can do this. It’s not quite as simple as missing rows (complete_cases()) but it’s also not too much of an ordeal, either.

Comments closed

Comparing Positron to RStudio

Theo Roe performs a product comparison:

Positron is the new beta Data Science IDE from Posit. Though Posit have stressed that maintenance and development of RStudio will continue, I want to use this blog to explore if Positron is worth the switch. I’m coming at this from the R development side but there will of course be some nuances from other languages in use within Positron that require some thought.

Read on for Theo’s perspective. Knowing that it’s using the same underlying framework as Visual Studio Code, I kind of wish this were an extension for VS Code rather than a separate app.

1 Comment

Deployment Parameters in Azure AI Foundry

Tomaz Kastrun continues a series on Azure AI:

Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant’s personality, tell it what it should and shouldn’t answer, and tell it how to format responses. There’s no token limit for this section, but it will be included with every API call, so it counts against the overall token limit

Click through for a description of each part of the deployment parameters section.

Comments closed

Mounding ADF Instances in Microsoft Fabric

Koen Verbeeck has an existing Azure Data Factory:

We recently started using Microsoft Fabric for our cloud data platform. However, we already have quite an estate of Azure data services running in our company, including a huge number of Azure Data Factory (ADF) pipelines. It seems cumbersome to migrate all those pipelines to Microsoft Fabric, especially because some features are not supported yet and ADF is the mature choice at the moment. We like the concept of Microsoft Fabric’s centralization, where everything is managed in one platform. Is there an option to manage ADF in Fabric?

Read on for the answer, but make sure to check out its limitations as well.

Comments closed