Press "Enter" to skip to content

Author: Kevin Feasel

IBM Acquires Confluent

Confluent has an announcement:

We are excited to announce that Confluent has entered into a definitive agreement to be acquired by IBM. After the transaction is closed (subject to customary closing conditions and regulatory approvals), together, IBM and Confluent will aim to provide a platform that unifies the world’s largest enterprises, unlocking data for cloud/microservices, accelerating time-to-value, and building the real-time data foundation required to scale AI across every organization. 

Whelp. I suppose it was bound to happen at some point, but I definitely can’t say this news pleases me.

Leave a Comment

Using Haskell for Data Science

Jonathan Carroll has my attention:

I’ve been learning Haskell for a few years now and I am really liking a lot of the features, not least the strong typing and functional approach. I thought it was lacking some of the things I missed from R until I found the dataHaskell (www.datahaskell.org) project.

There have been several attempts recently to enhance R with some strong types, e.g.  vapour (vapour.run), typr (github.com), using {rlang}’s checks (josiahparry.com), and even discussions about implementations at the core level e.g.  in September 2025 (stat.ethz.ch) continued in November 2025 (stat.ethz.ch). While these try to bend R towards types, perhaps an all-in solution makes more sense.

In this post I’ll demonstrate some of the features and explain why I think it makes for a good (great?) data science language.

I’ve been a big fan of F# for data science work as well for similar reasons, so it was interesting to read this article on Haskell. H/T R-Bloggers.

Leave a Comment

Generating Shape-Bound Random Points in SQL Server

Sebastiao Pereira generates some numbers:

Random number generation is vital in computer science, supporting fields like optimization, simulation, robotics, and gaming. The quality, speed, and sometimes security of the generator can directly affect an algorithm’s correctness, performance, and competitiveness. In Python, random number generation is well-supported and widely used. In this article, we will look how to we can use SQL to do this.

Click through for several examples.

Leave a Comment

Accessing Excel Files from OneDrive via Power BI

Kristyna Ferris is happy:

I can’t believe it’s finally here! A way to have Excel live in OneDrive and access it from Power BI nearly live! We can officially short cut files to our OneLake from both SharePoint and OneDrive! I am super excited about this feature, and I hope you are too. This feature plus User Data Functions allows us to not only have data from Excel in our reports but keep it as fresh as needed. Imagine having budget allocations that you want to adjust right before or during a meeting. Now you can! You can edit a file in Excel and hit one button to see the new numbers in your report. In the past, we relied on 3rd party services or Power Apps licensing to accomplish this sort of experience. Now we can just use Excel, an old data friend.

Kristyna does note that this is in preview, so take it with that caveat in mind and read on to see how it all works.

Leave a Comment

SQL Managed Instance Memory vs Cores

Kendra Little hits a pain point:

Microsoft recently announced that Azure SQL Managed Instance Next-gen General Purpose (GPv2) is now generally available. GPv2 brings significant storage performance improvements over GPv1, and if you’re using GPv1, you should plan to upgrade.

But GPv2 still has the same memory-to-core ratio problem that makes Managed Instance a rough deal for running SQL Server. SQL Server is engineered to use lots of memory—it’s a rare OLTP or mixed-OLTP workload that doesn’t need significant cache for reliable performance. We’ll have a look at the pricing math.

Read on for Kendra’s detailed thoughts on GPv2 versus GPv1 and also how GPv2 still has its warts.

Leave a Comment

Troubleshooting a Distributed Availability Group Failure

Jordan Boich digs in:

To give some background on this topology, they have a DAG comprised of two individual AGs. Their Global Primary AG (we’ll call this AG1) has three replicas, while their Forwarder AG (we’ll call this AG2) has two replicas. The replicas in AG1 are all in the same subnet and all Azure VMs. The replicas in AG2 are all in their own same subnet and all Azure VMs.

By the time we got logged in and connected, the Global Primary Replica was online and was able to receive connections. The secondary replicas in the Global Primary AG however, were unable to communicate with the Global Primary Replica. This is problem 1. The other secondary problem is that several databases on the Forwarder Primary Replica were in crash recovery. This is problem 2. Considering problem 2 was happening in the DR environment, that was put aside for the time being.

Read on for the troubleshooting process and solution.

Leave a Comment

UUIDv4 and UUIDv7 in PostgreSQL 18

Josef Machytka notes a change:

In the past there have been many discussions about using UUID as a primary key in PostgreSQL. For some applications, even a BIGINT column does not have sufficient range: it is a signed 8‑byte integer with range −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807. Although these values look big enough, if we think about web services that collect billions or more records daily, this number becomes less impressive. Simple integer values can also cause conflicts of values in distributed system, in Data Lakehouses when combining data from multiple source databases etc.

However, the main practical problem with UUIDv4 as a primary key in PostgreSQL was not lack of range, but the complete randomness of the values. This randomness causes frequent B‑tree page splits, a highly fragmented primary key index, and therefore a lot of random disk I/O. There have already been many articles and conference talks describing this problem. What many of these resources did not do, however, was dive deep into the on‑disk structures. That’s what I wanted to explore here.

In fairness to BIGINT, if you collect 10 billion records a day, that’s still about 180 million days of activity before you need to reset the ID, if I got my math right. But read on to see what UUIDv7 does for GUIDs.

Leave a Comment

Parallelizing Python Code

Osheen MacOscar makes a function faster:

The way it is currently written is how any normal for loop will run, where the current iteration must finish before the next one starts. With this code we shouldn’t need to wait for the previous API call, there is no dependency or anything like that. In theory we could run all of the individual player queries at once and the function would be a lot faster.

Read on to see how.

Leave a Comment

New T-SQL Functions

Tomaz Kastrun has been busy with this year’s advent of SQL Server 2025 blog posting. Catching up, Tomaz first looks at base-64 encoding and decoding:

SQL Server 2025 introduces a new T-SQL functions for BASE64 varbinary expressions. The first function returns base64-encoded text (BASE64_ENCODE() ), respectively for BASE64_DECODE().

BASE64_ENCODE converts the value of a varbinary expression into a base64-encoded varchar expression.

BASE64_DECODE converts a base64-encoded varchar expression into the corresponding varbinary expression.

I really like this, by the way. Base-64 encoding is quite useful for web transmissions, so having a place to generate that output easily is nice.

Second up is REGEXP_LIKE():

SQL Server 2025 introduces a new T-SQL functions for Regular Expressions (RegEx).

With multiple RegEx functions, the LIKE function indicates if the regular expression pattern matches the string or is in a string. The function is REGEXP_LIKE() that will do the job.

And third, we have REGEXP_SUBSTR() and REGEXP_REPLACE():

Continuing with SQL Server 2025 T-SQL functions for Regular Expressions for replace and substring functionalities.

Click through for Tomaz’s thoughts on all five of these functions.

Leave a Comment