Press "Enter" to skip to content

Category: Architecture

Choosing between ORMs vs Raw SQL

Jamal Hansen embraces the power of “and”:

At the beginning of this series, I promised that even if you know how to use an Object Relational Mapper (ORM) to interact with a database, knowing SQL would make you a better developer. Now that we have covered everything from SELECT to parameterized queries, it is time to answer the question that every Python developer eventually asks: when should I use an ORM, and when should I just write SQL?

I think there’s a reasonable split between the two, though the choice of ORM matters a lot as well. The heavier the ORM (nHibernate and Entity Framework, I’m looking at you), the more performance you’re trading off. By contrast, a micro-ORM like Dapper has a much slimmer profile, and if you combine that with judicious use of stored procedures for non-trivial work, the outcome is usually good.

Leave a Comment

Polymorphic Associations in Postgres

Andrei Lepikhov has multiple types:

Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I’ve managed to implement several straightforward optimisations of this template. But first, let’s understand what polymorphic references actually are, where they come from, and how common they really are. That’s the gap I’m trying to fill with this post.

Click through for the explanation. This isn’t the easiest problem to solve in the relational world, though I do tend to prefer the subclass/superclass solution, myself.

Leave a Comment

Foreign Tables and Materialized Views in PostgreSQL

Richard Yen provides a write-back system for analysts:

I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.

Click through for an architectural discussion of how they can work together.

Leave a Comment

Explaining the Fabric Ontology

James Serra takes us through a big word:

For years, most data conversations have started with tables. We ask where the data lives, what columns are available, how the joins work, and whether the data is in a warehouse, lakehouse, semantic model, or some other system. That makes sense, because tables are how most of us have worked with data for decades. But tables are not how the business thinks.

A business thinks in terms of customers, products, orders, shipments, assets, flights, runways, employees, policies, and actions. The problem is not usually a lack of data. The problem is a lack of shared meaning. Organizations often have the same business concept represented multiple ways across teams and systems, creating what I would call semantic drift. Sales may define a customer one way. Finance may define it another way. Operations may have yet another version in a different system with different keys, names, and assumptions. That is exactly where Fabric Ontology becomes important. It is designed to close the gap between physical data structures and business meaning.

Microsoft is a bit late to the ontology game and their current concept of an ontology shows. I can understand where they’re going but they still have a ways to go.

Leave a Comment

Solving Cross-Database References in Database Projects

Andy Brownsword creates a reference:

Large data solutions can span multiple databases. I’ve recently shared how amazing Database Projects can be but objects which reference across database boundaries can throw up warnings. Is the code solid or have you picked the wrong table and column combination? – you won’t know, and the code is fragile.

This is where References come into play. They make our projects aware of other databases so objects can reference them without unnecessary warnings. Plus intellisense is a nice bonus.

There are a few ways to handle these dependencies depending on how your projects are organised and what tooling you’re using, so let’s jump into the problem and check out the different options.

This was, without a doubt, one of the most painful things around working with database projects in Visual Studio. Not only do you need to keep up with your own database, but also the databases you reference. In addition, if the databases you reference happen to reference your database, you’ll end up with circular dependencies that will fail. The “best” solution was to create a stripped-down version of one of the databases that doesn’t include references, and then use that to jump-start the second database, and then use that to create the full version of the original database. So now you’re maintaining three separate database projects (at a minimum—this assumes you have one pair of databases with cross-referenced dependencies). Heaven help you if you have a cross-referencing love triangle.

1 Comment

5 Reasons Analytics Projects Fail

Reza Rad has a new video:

Have you ever wondered why analytics projects fail? There are hundreds of organizations, thousands of BI teams, and countless consulting companies building analytics solutions every year. Yet a large number of those projects never deliver what they promised.

Why does that keep happening?

Read on for a summary of Reza’s decades of experience.

Comments closed

The Pain of Views in PostgreSQL

Radim Marik digs into views:

VIEWs should be the cleanest abstraction SQL, and therefore Postgres, has on offer. I love the concept. The promise of decoupling logical intent from physical storage is perfect on paper. In practice, few things in the database world trigger such a heated debate or carry as much historical baggage. VIEWs mix big promises with false hopes, and the promises rarely survive contact with production.

The appeal is straightforward. Abstract “active customer” once and reuse it everywhere. Every query, report and dashboard uses the same definition. The “active customer” then becomes the foundation of a “customer orders” view, which in turn powers an operational “customer summary” view.

Some of these pain points are PostgreSQL-specific, but others (especially around the performance of nested views) resonates with SQL Server as well.

Comments closed

Cold Storage Data Archival

Brent Ozar answers a question:

At least once a month, I get this question from a client:

We have big data, and we want to save money. We want to move the older data down to some kind of archives that will cost us less.

I ask exactly one followup question:

Are you willing to modify the app that queries the data, or do you want the users to still be able to query the old data in their existing screens and reports?

Brent recommends trying a linked server approach. I’d also toss PolyBase in to the mix, though it’s going to have similar pros and cons to linked servers in this specific scenario. I have an old post on cold storage with PolyBase but the mechanisms haven’t changed much. I do have some sample scripts in my PolyBase talk’s GitHub repo, though fair warning that these are going away soon as I rebuild the talk for 2025.

Comments closed

Defining the Fabric Ontology

Mike Donnelly explains a term:

The short version: Fabric Ontology is the semantic backbone of Microsoft Fabric. It’s the layer that defines what your business data actually means.

If you’ve ever worked in a large organization, you know the problem. One team calls them “Customers,” another calls them “Clients,” and a third calls them “Account Holders.” Without a shared meaning, your analytics become a mess of conflicting vocabularies. An ontology is just a structured way of naming things and describing their relationships so everyone—and every tool—is using the same dictionary.

I think this is all correct, but I think there’s something more to ontologies than that. At least in the Palantir world, the ontology is not just the business definitions and concepts, but it’s also the actions you can take against that data. In other words, you might have Customers, Clients, and Account Holders. You can add a new customer, update the customer details, send a welcome to a new account holder, etc. Each of these actions is baked into the ontology as well. That’s what makes it different from simply defining where the data lives and how we got it in its current shape.

Comments closed

The Power of Database Projects

Andy Brownsword is sold:

If you’ve been using Database Projects for simply maintaining copies of your database objects – like I used to – then you’ve been missing out on the power of its deployments. Initially I was sceptical about how it could effectively perform upgrades but after seeing them land in SSMS last month I wanted to revisit them as a means of actual deployment.

My scepticism was completely misplaced, so if you haven’t touched Database Projects before – or had similar concerns as me – I want to demonstrate 3 features which make them not just good, but amazing for deployments.

Click through for those features. I will say that for straightforward databases, the database project deployment process is pretty good. Where it falls apart is when you have a large number of cross-database dependencies, especially if there are mutual cross-database dependencies: DB1.dbo.sp1 needs DB2.dbo.Table2, and DB2.dbo.view2 references DB1.dbo.Table1. In that case, the workaround is so annoying and essentially comes down to “have three separate database projects, one for DB1, one for DB2, and one for a scalled-down version of DB1/DB2 without the dependencies, and then use that to inject into the other DB.” Which does kind of work, yeah, but now you’re maintaining even more. And once you get to dozens of dependencies and lots of cross-database queries? Yeah, forget about it.

Comments closed