Press "Enter" to skip to content

Curated SQL Posts

Thoughts on Fabric OneLake

Teo Lachev shares some thoughts:

In a previous post, I shared my overall impression of Fabric. In this post, I’ll continue exploring Fabric, this time sharing my thoughts on OneLake. If you need a quick intro to Fabric OneLake, the Josh Caplan’s “Build 2023: Eliminate data silos with OneLake, the OneDrive for Data” presentation provides a great overview of OneLake, its capabilities, and the vision behind it from a Microsoft perspective. If you prefer a shorter narrative, you can find it in the “Microsoft OneLake in Fabric, the OneDrive for data” post. As always, we are all learning and constructive criticism would be appreciated if I missed or misinterpreted something.

I think some of Teo’s criticism comes from the idea that OneLake should also mean one lakehouse or one data lake, but the abstraction is one level higher than that. I would like to see some of Teo’s ideas make it into GA, though.

Leave a Comment

Source Control and Change Management for Postgres

Ryan Booz relives an older story:

For those of you that don’t know, those ER tools were really expensive (probably still are for the ones that exist) and only a few developers had access to the tool. They didn’t have a great DX either.

Aside from the lack of automation and ability of our developers to be more integrated into the process, there was always the one looming issue that we just couldn’t reconcile.

If Joe left and joined the circus (see, I got you there), we were stuck.

We knew this was a bottleneck for some time and we had tried multiple times to change the process. Our ability to iterate on new feature development went through one person and a set of 15-year-old scripts. It didn’t match our otherwise Agile process of front-end code and data analysis projects.

Read on for Ryan’s thoughts on database change management. Some of the tools mentioned work with multiple database platforms, whereas others are specific to Postgres.

Leave a Comment

Enabling Postgres Auditing

Athar Ishteyaque enables an extension:

The PostgreSQL Audit Extension (or pgaudit) provides detailed session and/or object audit logging via the standard logging facility provided by PostgreSQL.

The goal of a PostgreSQL audit is to provide the tools needed to produce audit logs. These logs are often required to pass certain government, financial, or ISO certification audits.

I am kind of curious what the performance impact of this extension is.

Leave a Comment

A Primer on Regular Expressions

Steven Sanderson provides a quick guide to regular expressions:

Regular expressions, often abbreviated as regex, are powerful tools used in programming to match and manipulate text patterns. While they might seem intimidating at first, regular expressions are incredibly useful for tasks like data validation, text parsing, and pattern matching. In this blog post, we’ll explore regular expressions in the context of R programming, breaking down the concepts step by step and providing practical examples along the way. By the end, you’ll have a solid understanding of regular expressions and be ready to apply them to your own projects.

This is an extremely powerful language which can take years (decades?) to master, especially considering that there are several regular expression syntaxes and they don’t all behave the same way. But still, I’ve found that the more familiar you are with regular expressions, the simpler certain classes of problem become.

Leave a Comment

PyPI and Malicious Code

Steven Vaughan-Nichols gives us the story:

The Python Package Index (PyPI), is the most popular Python programming language software repository. It’s also a mess. Earlier this year, the FortiGuard team discovered zero-day malware in three PyPI packages called “colorslib,” “httpslib,” and “libhttps.”  Before that, 2022 closed with  PyTorch-nightly on Linux being poisoned with a fake dependency. More recently, PyPI had to stop new user registrations and project creations because of a flood of malicious users. PyPI isn’t the only one to notice the user trouble. The Python Software Foundation (PSF) received three subpoenas for PyPI user data. What is going on here!?

Read on to learn more about what’s happening with the most popular Python repository.

Leave a Comment

SQL Anti-Patterns Extended Event in SQL Server 2022

Dennes Torres finds some anti-patterns:

One of the new Extended Event available in SQL Server 2022 is the query_antipattern. This extended event allows to identify anti-patterns on the SQL queries sent to the server.  An anti-pattern in this case is some code that the SQL Server optimizer can’t do a great job optimizing the code (but cannot correct the issue automatically).

This is a very interesting possibility: Including this event in a session allow us to identify potential problems in applications. We can do this in development environments to the the problems earlier in the SDLC (Software Development Life Cycle).  Let’s replicate some examples and check how this works.

Dennes shows two examples and notes that there are five total listed in the Extended Event, but that the documentation is a bit lacking to explain their intent.

Leave a Comment

Showing KQL Queries

Dany Hoter looks at some KQL query plans:

Each visual on the page is going to summarize data from one or more queries and add the summarize part of the query.

If your model contains multiple tables in direct query with relations between them, the connector will generate joins between the tables.

Selecting values in filters will create multiple where conditions.

In order to see the final query and understand the performance implications of each query and the total query load created by a report, you need to use the command “.show queries” in the context of the database.

Click through for Dany’s notes on the topic, including a few tips on what to look for.

Leave a Comment

Listing Topics in Kafka without Zookeeper

The BIg Data in Real World team has a quick one for us:

Kafka uses Zookeeper to manage it’s internal state. So it is not possible to run Kafka without Zookeeper. Even if you don’t have access to Zookeeper in your organization, there is a Zookeeper cluster running which your Kafka cluster connects to.

So, how to list topics and execute other commands if we don’t have access to Zookeeper?

Eventually, this won’t even be a question, as Kafka already has production versions using KRaft, and by Kafka 4.0, there won’t be a Zookeeper to kick around anymore.

Leave a Comment

Data Inconsistency in Postgres HA Clusters

Umair Shahid gives us an overview:

While PostgreSQL is known for its robustness, scalability, and reliability, data inconsistency can occur in PostgreSQL clusters, which can cause issues and impact the overall performance of the system. In this blog, we’ll define data inconsistency in PostgreSQL clusters, discuss the challenges it poses, its causes, and provide some tips on how to prevent and resolve it if it occurs.

Click through for the article.

Leave a Comment