2025-09-04 – Curated SQL

Join Strategies in Apache Spark

Published 2025-09-04 by Kevin Feasel

Ram Ghadiyaram looks at three join strategies in Apache Spark:

In this article, we are going to discuss three essential joins of Apache Spark.

The data frame or table join operation is most commonly used for data transformations in Apache Spark. With Apache Spark, a developer can use joins to merge two or more data frames according to specific (sortable) keys. Writing a join operation has a straightforward syntax, but occasionally the inner workings are obscured. Apache Spark internal API suggests several algorithms for joins and selects one. A basic join operation could become costly if you do not know what these core algorithms are or which one Spark uses.

This is not a comprehensive list, but it does cover three of the more common strategies when dealing with larger datasets.

Comments closed

What’s New in Microsoft Fabric Data Warehouse

Published 2025-09-04 by Kevin Feasel

Sowmya Sivaraman has an update:

Welcome to the August 2025 edition of What’s New in Fabric Warehouse. As summer winds down, despite August being a slower month, our team continued to deliver meaningful updates. We shipped several new features focused on enhancing data ingestion, improving the data management, and streamlining security. At the same time, much of our energy is going into preparing exciting announcements for FabCon Vienna — stay tuned for what’s coming next. Whether you’re optimizing workloads, building with SQL, or exploring new integrations, this roundup highlights improvements we think you’ll find valuable.

Click through for a list of changes.

Comments closed

Oracle’s LOGMINER and STREAMS Tools in the Modern Era

Published 2025-09-04 by Kevin Feasel

David Fitzjarrell looks at two classic tools:

Change is good, and occasionally Oracle changes utilities to make them easier to implement. Over the years a tool called LOGMINER has been available for various replication tasks, such as logical standby databases and an older product called STREAMS as well as updated tools such as Golden Gate. Let’s look into this topic again, with versions from 19c onward.

Click through for a bit of history on both tools, as well as where they’re at today.

Comments closed

Splitting to a Table via Regular Expression

Published 2025-09-04 by Kevin Feasel

Louis Davidson creates a table:

Continuing on with the REGEXP_ functions series, the next one I want to cover is the table valued function REGEXP_SPLIT_TO_TABLE. This function is definitely one of the ones you probably ought to know, especially if you are ever tasked to pull some data out of a data structure.

This function is a lot like the STRING_SPLIT function, and unlike things like the REGEXP_LIKE function, you can basically use the same main parameters as you used in STRING_SPLIT for simple cases, but from there the possibilities are a lot more endless because you can define almost any delimiters you want. It isn’t perfect, because of a few things, but we will discuss that more later on.

Read on to see how it works, including one major caveat.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Day: September 4, 2025

Join Strategies in Apache Spark

What’s New in Microsoft Fabric Data Warehouse

Oracle’s LOGMINER and STREAMS Tools in the Modern Era

Splitting to a Table via Regular Expression