Press "Enter" to skip to content

Day: January 28, 2025

Emitting Data to a Single CSV in Spark

Chen Hirsh wants to consolidate:

To write and read data faster, Spark splits the work between nodes in a cluster, each reading\writing part of the data. That’s why, in the screenshot above, there are 3 CSV files (That’s the files starting with “Part”, with a CSV extension), instead of 1. Note that this can also occur when working with a single node cluster since Spark splits the work into tasks.

This behavior is great if you intend to keep working with the CSV files in Databricks since reading will be faster. But if you want to share this file with someone outside of Databricks, this may be inconvenient.

Read on for two ways of doing this, as well as the price you pay to get it done.

Leave a Comment

Dropping a Role in PostgreSQL

Josephine Bush drops a role:

You can’t just exec DROP ROLE your_role_name; if it’s granted perms or other roles are granted to it. I had to go fishing to find all the grants to revoke them. Note: if you are worried about re-granting later, you can always fiddle with this to output the grants for these perms as a rollback.

Read on for a few scripts to help out with finding what that role owns, revoking rights, and reassigning ownership.

Leave a Comment

Truncating All Tables while Preserving Foreign Keys in T-SQL

Ronald Kraijesteijn builds a script:

When testing a data warehouse, a common challenge is managing large datasets effectively. Often, you need to reset tables to a clean state, ensuring consistent testing environments. The most efficient way to clear a table is using the SQL command TRUNCATE TABLE. However, this command is not straightforward when foreign key constraints are present. In this article, we’ll explore a solution that temporarily disables constraints, allows truncation, and then restores the constraints—keeping your data model intact.

Click through for the script, which saves a record of all of the foreign key constraints, truncates each table, and then re-creates the foreign keys.

Leave a Comment

Automatically Refreshing a Power BI Semantic Model after Dataflow Loads

Reza Rad refreshes a model:

Although this seems to be a simple thing to do, it is not a function that you can turn on or off. If you have a Dataflow that does the ETL and transforms and prepares the data, then to get the most up-to-date data into the report, you will need to refresh the Power BI semantic model after that, only upon successful refresh of both dataflow and semantic model is when you will have the up-to-date data into the report. Fortunately, in Fabric, this is a straightforward setup. In this article and video, I’ll explain how this is possible.

Click through for the video and the blog post. Granted, this feature is in preview, but using it is pretty straightforward.

Leave a Comment

Calculating the Distance between Points via T-SQL

Sebastiao Pereira gets out the measuring tape:

How do you calculate the distance between two different points on a sphere using TSQL in SQL Server? In this article, we look at how this can be done to calculate different points on the globe.

Sebastiao first shows the raw calculation, then uses the GEOGRAPHY data type to simplify the job. Note that these are “as the crow flies” distances and not travel distances, as there’s no information on roads.

Also, these are calculations specific to Earth. Which sounds like the setup for a joke, but it’s really not. If you have a smaller sphere (or oblong spheroid, if you will) and you want to calculate the distance, use the GEOMETRY data type instead.

Leave a Comment

Power BI Embedded and Direct Lake Mode

Hasan Abo Shally announces a new preview feature:

As we step into 2025, we’re excited to announce the preview of Power BI Embedded with Direct Lake Mode, a new feature designed to enhance how developers and Independent Software Vendors (ISVs) provide embedded analytics in their applications. This capability, available in preview starting Q1 2025, leverages the power of Direct Lake Mode to allow an even enhanced performance and experience for embedded analytics.

This probably affects a small audience, though the functionality is good to see.

Leave a Comment

Drawbacks of Version Store in SQL Server

Haripriya Naidu points out that nothing comes for free:

With every feature comes both advantages and disadvantages and it is important to be aware of both. In regards to version store, we had discussed how to understand version store better and how version store is enabled.

Today, we will talk about the trade-offs and implications that come along when version store is enabled.

Click through for the downsides. With storage prices and speeds where they are, the tradeoffs of using the version store for RCSI or Snapshot isolation are almost always worth it. But it’s still good to know what those trade-offs are and how you can end up in a degenerate state.

Leave a Comment