Press "Enter" to skip to content

Category: Architecture

The Muddy World of Hybrid Environments

Deborah Melkin muddies up the waters a bit:

My first reaction to this sentence was: Oh look, he assumes that everyone is in the cloud…

My next reaction was: Oh look, he acknowledged that maybe everyone isn’t in the clouds in his follow-up examples.

Since the cloud was introduced, the assumption was always going to be everyone will moving to the cloud. “You’ll be behind if you don’t learn the cloud,” they said. Then as people either moved or started looking into moving their databases, they realized they just couldn’t – whether it was due to missing features, higher than expected costs, etc. There was always some reason.

Click through for some of the challenges and realities of organizations where certain cloud-first or cloud-only services would be a major challenge, versus other services that are typically easier to deal with.

Leave a Comment

Skills for Cloud-to-On-Prem Migration

Reitse Eskens focuses on a set of skills:

This month, Koen Verbeeck invites the blogging community to write about their thoughts on returning to on-premises. What could be struggles, things we have to re-learn, etcetera.

When I read the invite, it immediately sparked inspiration, because there are increasing rumours around cloud exits. People musing about ‘what if’. Some clients reference these questions, but so far no one has directly asked me one with the intent of moving forward with it.

Click through for Reitse’s thoughts.

Leave a Comment

Choosing between ORMs vs Raw SQL

Jamal Hansen embraces the power of “and”:

At the beginning of this series, I promised that even if you know how to use an Object Relational Mapper (ORM) to interact with a database, knowing SQL would make you a better developer. Now that we have covered everything from SELECT to parameterized queries, it is time to answer the question that every Python developer eventually asks: when should I use an ORM, and when should I just write SQL?

I think there’s a reasonable split between the two, though the choice of ORM matters a lot as well. The heavier the ORM (nHibernate and Entity Framework, I’m looking at you), the more performance you’re trading off. By contrast, a micro-ORM like Dapper has a much slimmer profile, and if you combine that with judicious use of stored procedures for non-trivial work, the outcome is usually good.

Comments closed

Polymorphic Associations in Postgres

Andrei Lepikhov has multiple types:

Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I’ve managed to implement several straightforward optimisations of this template. But first, let’s understand what polymorphic references actually are, where they come from, and how common they really are. That’s the gap I’m trying to fill with this post.

Click through for the explanation. This isn’t the easiest problem to solve in the relational world, though I do tend to prefer the subclass/superclass solution, myself.

Comments closed

Foreign Tables and Materialized Views in PostgreSQL

Richard Yen provides a write-back system for analysts:

I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.

Click through for an architectural discussion of how they can work together.

Comments closed

Explaining the Fabric Ontology

James Serra takes us through a big word:

For years, most data conversations have started with tables. We ask where the data lives, what columns are available, how the joins work, and whether the data is in a warehouse, lakehouse, semantic model, or some other system. That makes sense, because tables are how most of us have worked with data for decades. But tables are not how the business thinks.

A business thinks in terms of customers, products, orders, shipments, assets, flights, runways, employees, policies, and actions. The problem is not usually a lack of data. The problem is a lack of shared meaning. Organizations often have the same business concept represented multiple ways across teams and systems, creating what I would call semantic drift. Sales may define a customer one way. Finance may define it another way. Operations may have yet another version in a different system with different keys, names, and assumptions. That is exactly where Fabric Ontology becomes important. It is designed to close the gap between physical data structures and business meaning.

Microsoft is a bit late to the ontology game and their current concept of an ontology shows. I can understand where they’re going but they still have a ways to go.

Comments closed

Solving Cross-Database References in Database Projects

Andy Brownsword creates a reference:

Large data solutions can span multiple databases. I’ve recently shared how amazing Database Projects can be but objects which reference across database boundaries can throw up warnings. Is the code solid or have you picked the wrong table and column combination? – you won’t know, and the code is fragile.

This is where References come into play. They make our projects aware of other databases so objects can reference them without unnecessary warnings. Plus intellisense is a nice bonus.

There are a few ways to handle these dependencies depending on how your projects are organised and what tooling you’re using, so let’s jump into the problem and check out the different options.

This was, without a doubt, one of the most painful things around working with database projects in Visual Studio. Not only do you need to keep up with your own database, but also the databases you reference. In addition, if the databases you reference happen to reference your database, you’ll end up with circular dependencies that will fail. The “best” solution was to create a stripped-down version of one of the databases that doesn’t include references, and then use that to jump-start the second database, and then use that to create the full version of the original database. So now you’re maintaining three separate database projects (at a minimum—this assumes you have one pair of databases with cross-referenced dependencies). Heaven help you if you have a cross-referencing love triangle.

1 Comment

5 Reasons Analytics Projects Fail

Reza Rad has a new video:

Have you ever wondered why analytics projects fail? There are hundreds of organizations, thousands of BI teams, and countless consulting companies building analytics solutions every year. Yet a large number of those projects never deliver what they promised.

Why does that keep happening?

Read on for a summary of Reza’s decades of experience.

Comments closed

The Pain of Views in PostgreSQL

Radim Marik digs into views:

VIEWs should be the cleanest abstraction SQL, and therefore Postgres, has on offer. I love the concept. The promise of decoupling logical intent from physical storage is perfect on paper. In practice, few things in the database world trigger such a heated debate or carry as much historical baggage. VIEWs mix big promises with false hopes, and the promises rarely survive contact with production.

The appeal is straightforward. Abstract “active customer” once and reuse it everywhere. Every query, report and dashboard uses the same definition. The “active customer” then becomes the foundation of a “customer orders” view, which in turn powers an operational “customer summary” view.

Some of these pain points are PostgreSQL-specific, but others (especially around the performance of nested views) resonates with SQL Server as well.

Comments closed

Cold Storage Data Archival

Brent Ozar answers a question:

At least once a month, I get this question from a client:

We have big data, and we want to save money. We want to move the older data down to some kind of archives that will cost us less.

I ask exactly one followup question:

Are you willing to modify the app that queries the data, or do you want the users to still be able to query the old data in their existing screens and reports?

Brent recommends trying a linked server approach. I’d also toss PolyBase in to the mix, though it’s going to have similar pros and cons to linked servers in this specific scenario. I have an old post on cold storage with PolyBase but the mechanisms haven’t changed much. I do have some sample scripts in my PolyBase talk’s GitHub repo, though fair warning that these are going away soon as I rebuild the talk for 2025.

Comments closed