Press "Enter" to skip to content

Month: November 2024

Table Cloning in Snowflake

Kevin Wilkie creates a clone:

In this coding scenario, I am copying everything from TableA and pushing it into a new table called TableB in the same database and schema.

If you check the size of the database before and after you clone a table, it will be the same size – no matter the size of TableA. This is because, at this point in time, TableB exists only as a “pointer” to the data that constitutes TableA. It is not until something changes in one of the tables – say adding a row to TableA, that it stops being a “pointer” and is artificially constituted.

Read on to learn more about how this works.

Comments closed

Handling a Consumer Fetch Request in Kafka

Multiple Confluent employees (who apparently don’t get to have names this time around) wrap up a series:

It’s been a long time coming, but we’ve finally arrived at the fourth and final installment of our blog series. In this series, we’ve been peeling back the layers of Apache Kafka® to get a deeper understanding of how best to interact with the cluster using producer and consumer clients.

Read on for the final part, as well as links to previous parts if you missed them.

Comments closed

Comparing Azure Kubernetes Service and Container Apps

Gaurav Shukla makes a comparison:

Hello Readers!! Welcome to the new blog!! AKS vs ACA, which is best in cloud migration? When migrating an application to the cloud, choosing the right platform is crucial to ensure scalability, cost-effectiveness, and ease of management. Two of the prominent services offered by Azure for running containerized applications are Azure Kubernetes Service (AKS) and Azure Container Apps (ACA). Both are excellent choices, but their use cases, complexity, and operational overhead differ significantly. This blog will provide a detailed comparison of AKS and ACA, helping you decide which is the best approach for your cloud migration.

Read on for an overview of each service and a nice table outlining the differences.

Comments closed

The Challenge of Major Version Upgrades in PostgreSQL

Peter Eisentraut lays out the explanation:

Upgrades between PostgreSQL major versions are famously annoying. You can’t just install the server binaries and restart, because the format of the data directory is incompatible.

Why is that? Why can’t we just keep the data format compatible?

Perhaps surprisingly, the data format is actually mostly compatible, but not completely. There are just a few things missing that are very hard to solve.

Perhaps I’m not as sympathetic as I should be to the core developers, but there are other RDBMS platforms that have a direct path for upgrade from version to version, so it’s hardly insurmountable.

Comments closed

A Mechanism for Change Management for DBAs

Terri Hurley shares one path:

Back in the day when I received an email from a developer who requested a Production Change, I would copy and paste SQL code from an email to SQL Server to run against a production database. I never bothered to save that code. But later when I was asked for a history of database changes, it was awkward to locate all the previous email requests asking for production changes. Fast forward to present day, most businesses have embraced Change Management Separation of Duties in companies of all sizes.

My preferred path to change management is:

  • All changes go into source control. You can either have the people making changes create the actual scripts to run, or generate them from base changes in source control, dealer’s choice.
  • Each changeset goes to a specific environment when it gets PR’d into the appropriate branch—that is, dev, QA, test, pre-prod, prod, post-prod, next-gen-post-dev-unprod, whatever.
  • DBAs are on pull requests, at least after a certain point. PR to dev? Let the team handle that code review. Once you move nearer to prod, at least one DBA performs a code review.
  • After PR gating happens, your CI/CD system automatically deploys the changes, assuming it passes any continuous integration tests.

This takes time to get right, but it’s very much worth the effort. I worked at a company where they built the release management processes on top of Git + Azure DevOps. You can also use tools like Flyway.

Comments closed

Contrasting Power BI and Power BI Report Server

Soheil Bakhshi makes me sad:

Depending on your organisation’s preferences, data governance requirements, and the platforms you intend to use for report deployment you may use either Power BI Desktop, the “standard version”, or Power BI Desktop RS (Report Server). Power BI Desktop has variations tailored to meet specific needs, such as cloud-based analytics or on-premises reporting. While many users might only encounter the standard version, there’s another important variant for specialised scenarios.

You know the meme where the kid asks, “Mom, I want Power BI!” and mom says, “We have Power BI at home.” Power BI Report Server is Power BI at home, in all of the senses of the joke. I’ve been stuck with it at two employers and although it is better than nothing, you quickly find out how annoying it gets when that thing you know how to do in Power BI Online doesn’t work because they never supported it in PBIRS.

Comments closed

Power BI Writeback via Fabric SQL Database

Jon Voge gives us a use case for Fabric SQL Databases:

Until recently, Fabric has allowed us to choose between Lakehouses and Warehouses as a backend. For write-back use cases, neither are ideal.

  • The SQL Endpoint of Lakehouses are Read-Only, making writes from Power Apps impossible.
  • While the SQL Endpoint of Warehouses are write-enabled, they do not support enforced Primary Keys, which are a hard requirement for Power Apps to be able to write directly to a data source.

Jon briefly describes two mechanisms people used and then how you can do this more effectively with a Fabric SQL Database. Based on the article, it seems that you could probably still do the same with an Azure SQL Database, though I suppose handling the managed identity could be an issue.

Comments closed

Data Visualization in Matplotlib

Rajendra Gupta generates some graphics:

Data analysis requires analysts to handle structured, semi-structured, or unstructured data. Small datasets with few rows and columns are easy to understand. However, as the data complexity increases with many interlinked variables, getting data insights from tabular formatted data becomes challenging. According to a recent study from MIT, the human brain processes an entire image in just 13 milliseconds. Therefore, it is helpful to learn Python and visualization together.

How do we use Python to generate plots from the data to analyze patterns, correlations, and trends? What plots are available, and how do we use them with customizations? Let’s explore them in this tip.

There are a few visualization libraries in Python I prefer over matplotlib, and for static graphics, ggplot2 in R has pretty much everything else beat. But matplotlib is essentially the standard, so it’s important to know.

Comments closed

GROUP BY Clause Order and Performance in PostgreSQL

Andrei Lepikhov runs some tests:

PostgreSQL users often employ analytical queries that sort and group data by different rules. Optimising these operators can significantly reduce the time and cost of query execution. In this post, I will discuss one such optimisation: choosing the order of columns in the GROUP BY expression.

Postgres can already reshuffle the list of grouped expressions according to the ORDER BY condition to eliminate additional sorting and save computing resources. We went further and implemented an additional strategy of group-by-clause list permutation in a series of patches (the first attempt and the second one) for discussion with the Postgres community, expecting it to be included in the next version of PostgreSQL core. You can also try it in action in the commercial Postgres Pro Enterprise fork.

From a “this is a 4th generation language” perspective, there should never be a performance difference in the ordering of grouped columns in a GROUP BY clause. The database optimizer should always determine the optimal ordering. Read Andrei’s article to see how much of a challenge that can be in practice.

Comments closed

Building a Gantt Chart

Amy Esselman wants to know if you’re on time or have any slack:

A Gantt chart is used to visually present tasks or events over time. It consists of a horizontal axis of time and horizontal bars representing the duration of each item arranged along the vertical axis.

The visual was popularized in the early 1900s by American engineer and management consultant Henry Gantt, for whom the chart is named. Most commonly, this view is used to see the start date and duration of each task in a larger project to better understand timelines, milestones, and dependencies.

Click through for an overview of the chart, where it can be useful, and some of the limitations of the chart.

Comments closed