Press "Enter" to skip to content

Day: November 18, 2024

Updates on the Spark Connect Dotnet Library

Ed Elliott has an update for us:

There have been quite a few changes in the last couple of months and I just wanted to give a quick update on the current state of the project. In terms of usage I am starting to hear from people using the library and submitting pr’s and requests so although usage is pretty low (which is expected from the fact that the Microsoft supported version usage wasn’t very high) it is growing which is interesting.

Read on for thoughts on production readiness, support for Spark 4.0, a couple of other updates, and some future plans.

Comments closed

Creating a Numbers Table in Power Query

Koen Verbeeck builds a tally table:

Sometimes you need a numbers table (also called tally table) when you’re working with data. It’s a table with one single integer column that contains incremental numbers (0,1,2,3,4,…). In a database, it’s fairly easy to generate using either cross-joins or a generator function (GENERATE_SERIES in T-SQL). Recently, I needed such a table in Power BI Desktop. Suppose you don’t have a database as a source (or maybe you cannot change the database), so we need to generate this in Power Query.

Read on to see how.

Comments closed

Comparing Numeric Data Types between MySQL and PostgreSQL

Aisha Bukar compares data types:

Numeric data types serve as a foundation in DBMS. It helps represent numbers in different forms which helps in storing and manipulating arithmetic operations on numerical data.

When dealing with numeric data types in databases like MySQL and PostgreSQL, it’s important to understand key concepts like range, precision(fixed-point numbers, floating-point numbers), and storage. This helps you understand how data is stored, processed, and retrieved.

Read on for quick descriptions of the three primary properties of numeric data types (range, precision, and storage), as well as how the two relational database platforms handle numeric data.

Comments closed

Tracking Column Sizes on DAX Queries

Chris Webb busts out the calculator:

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

Click through for those simple examples, though note that this requires Microsoft Fabric.

Comments closed

Concurrency Control in Oracle vs PostgreSQL

Umair Shahid continues a series on migrating from Oracle to PostgreSQL:

Transitioning from Oracle to PostgreSQL can be a transformative experience for database administrators because of the subtle differences between the two technologies. Understanding how the two handle concurrency differently is critical to managing highly concurrent workloads. 

Concurrency control is essential for maintaining data consistency when multiple users access the database simultaneously. Oracle and PostgreSQL take different approaches to concurrency control: Oracle primarily relies on locking and consistent snapshots, while PostgreSQL utilizes a Multi-Version Concurrency Control (MVCC) system.

This article provides an in-depth look at concurrency control in PostgreSQL from an Oracle perspective.

Read on for that comparison.

Comments closed

Missing Data in Microsoft Fabric Real-Time Intelligence Workloads

Greg Low covers a common scenario:

But another area that I see very few handling well, is the data that is missing, rather than just the data that is present. There’s a huge difference between data that arrived, and is odd, and data that just didn’t arrive at all.

One tool that’s great at working with streams of data is the Real Time Intelligence workload for Microsoft Fabric. And it’s also great at working with data that is missing from those streams.

Greg covers some of the scenarios around missing data, though not a lot on the process to fix them.

Comments closed