Press "Enter" to skip to content

Day: February 11, 2025

Tips for Scaling Apache Kafka

Narendra Lakshmana Gowda tunes a Kafka cluster:

Apache Kafka is known for its ability to process a huge quantity of events in real time. However, to handle millions of events, we need to follow certain best practices while implementing both Kafka producer services and consumer services.

Before start using Kafka in your projects, let’s understand when to use Kafka:

Much of the advice is pretty standard for performance tuning in Kafka, like setting batch size and linger time on the producer or managing consumers in a consumer group.

Leave a Comment

REPLACENULL in SSIS versus DT_DBTIMESTAMP2 Columns

Slava Murygin notes an error:

 Using the “REPLACENULL” functionality frequently in the “Derived Column” component, the “Conditional Split” component, and other places in SSIS where formulas can be applied is common.

However, I recently encountered an issue with the “DT_DBTIMESTAMP2” data type.

The following formula produced an error:

REPLACENULL(TestDt, (DT_DBTIMESTAMP2,7)”1900-01-01 00:00:00.0000000″)

Error: 0xC020902A at Test Transformation, Derived Column [2]: The “Derived Column” failed because truncation occurred, and the truncation row disposition on “Derived Column.Outputs[Derived Column Output].Columns[TestDt]” specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

Read on for an explanation and two alternatives.

Leave a Comment

Azure Elastic Jobs to Run Powershell and T-SQL

Josephine Bush kicks off a job:

I’ve covered how to create Elastic Jobs in the portal (this one is important to read if you aren’t familiar with elastic jobs already), with Terraform, and with Bicep. Now, I’ll cover how to create them and their associated objects with PowerShell. Don’t do this in prod to start. Always test in a lower environment first.

Click through for the process, as well as the script.

Leave a Comment

Improving Power Query CSV File Performance with Data Columns

Chris Webb makes things go faster:

A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.

Read on for the example and explanation.

Leave a Comment

Scaling with PostgreSQL

Shayon Mukherjee shares some tips about scaling in PostgreSQL:

“Postgres was great when we started but now that our service is being used heavily we are running into a lot of ‘weird’ issues”

This sentiment is frequently echoed by CTOs and senior engineers at high-growth startups when I speak with them.

Scaling PostgreSQL successfully doesn’t always require a full team of DBAs and experts. The beauty of PostgreSQL is that solutions often lie within the database itself – by rethinking your data access patterns from first principles, you can solve many business problems at scale.

In this post, I’ll address some common “weird” issues I’ve encountered and explore solutions that work at scale.

I like some of them, though I’m not a fan of eliminating or delaying foreign key constraints, as those are important for data quality. I’m not knowledgeable enough in PostgreSQL administration to have a strong opinion on these, however.

Leave a Comment