Curated SQL will take the next day or two off. At latest, we’ll be back in the saddle by Monday the 17th, but you might even see a bit of lovingly hand-crafted blog posts before then, so stay tuned!
Leave a CommentCurated SQL Posts
Narendra Lakshmana Gowda tunes a Kafka cluster:
Apache Kafka is known for its ability to process a huge quantity of events in real time. However, to handle millions of events, we need to follow certain best practices while implementing both Kafka producer services and consumer services.
Before start using Kafka in your projects, let’s understand when to use Kafka:
Much of the advice is pretty standard for performance tuning in Kafka, like setting batch size and linger time on the producer or managing consumers in a consumer group.
Leave a CommentTien Nguyen Anh shares some advice:
A flaky test is a test that has inconsistent results in multiple test run, even when the automation script has not been changed. Sometimes it passes, and sometimes it fails, which makes it unreliable and difficult to trust for identifying issues.
Read on to understand a variety of causes and what you can do about them.
Leave a CommentUsing the “REPLACENULL” functionality frequently in the “Derived Column” component, the “Conditional Split” component, and other places in SSIS where formulas can be applied is common.
However, I recently encountered an issue with the “DT_DBTIMESTAMP2” data type.
The following formula produced an error:
REPLACENULL(TestDt, (DT_DBTIMESTAMP2,7)”1900-01-01 00:00:00.0000000″)
Error: 0xC020902A at Test Transformation, Derived Column [2]: The “Derived Column” failed because truncation occurred, and the truncation row disposition on “Derived Column.Outputs[Derived Column Output].Columns[TestDt]” specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
Read on for an explanation and two alternatives.
Leave a CommentJosephine Bush kicks off a job:
I’ve covered how to create Elastic Jobs in the portal (this one is important to read if you aren’t familiar with elastic jobs already), with Terraform, and with Bicep. Now, I’ll cover how to create them and their associated objects with PowerShell. Don’t do this in prod to start. Always test in a lower environment first.
Click through for the process, as well as the script.
Leave a CommentChris Webb makes things go faster:
A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.
Read on for the example and explanation.
Leave a CommentShayon Mukherjee shares some tips about scaling in PostgreSQL:
“Postgres was great when we started but now that our service is being used heavily we are running into a lot of ‘weird’ issues”
This sentiment is frequently echoed by CTOs and senior engineers at high-growth startups when I speak with them.
Scaling PostgreSQL successfully doesn’t always require a full team of DBAs and experts. The beauty of PostgreSQL is that solutions often lie within the database itself – by rethinking your data access patterns from first principles, you can solve many business problems at scale.
In this post, I’ll address some common “weird” issues I’ve encountered and explore solutions that work at scale.
I like some of them, though I’m not a fan of eliminating or delaying foreign key constraints, as those are important for data quality. I’m not knowledgeable enough in PostgreSQL administration to have a strong opinion on these, however.
Leave a CommentRichard Vale digs into a dataset:
In this post, I would like to draw attention to a very interesting data set collected by Guan, Palma and Wu as part of the replication package for their paper The rise and fall of paper money in Yuan China, 1260-1368. The paper describes inflation, money and prices during the Yuan Dynasty era in China.
First, a little historical background.
Read on for the analysis. H/T R-Bloggers.
Leave a CommentI was editing an article the other day that uses the
BIT_COUNT
function that was added to SQL Server 2022. The solution presented is excellent, and I will try to come back and link to it here when I remember. (It will be linked the other way.Reading that did two things for me. First it cave me an idea of how the
BIT_COUNT
function might be actually be used in a useful way. The solution that was presented would only work in SQL Server 2022 (It will work in earlier compatibility levels, based on the tests I have done.)
Read on for what Louis tried out.
Leave a CommentIn a number of previous blogs and in my session on loadtesting Microsoft Fabric, I’ve always questioned the metrics app and one specific point is the timepoint detail. When you click on a graph, you get the option to go to the timepoint detail and read more.
This is all fun and games but looking at the list of active processes at that specific point in time, you’ll quickly see processes that are way out of the selected point in time. For me, it rendered this thing useless because it messed up the things I wanted to see.
Read on to see the right way to handle this app.
Leave a Comment