Press "Enter" to skip to content

Day: October 12, 2023

Apache Spark Execution Plan Analysis

Karthik Penikalapati digs into Spark SQL explain plans:

In this blog post, we will explore how the Explain Plan can be your secret weapon for debugging and optimizing Spark applications. We’ll dive into the basics and provide clear examples in Spark Scala to help you understand how to leverage this valuable tool.

All I’m saying is, if some company wants to create SQL Sentry Plan Explorer for Apache Spark, I’d be down with it. That loss of an intuitive and powerful graphical interface for execution plans is definitely a point of friction when working with Apache Spark and Spark SQL.

Comments closed

Apache Kafka 3.6 Released

Satish Duggana announces what’s new in Apache Kafka 3.6:

The ability to migrate Kafka clusters from a ZooKeeper metadata system to a KRaft metadata system is now ready for usage in production environments. See the ZooKeeper to KRaft migration operations documentation for details. Note that support for JBOD is still not available for KRaft clusters, therefore clusters utilizing JBOD can not be migrated. See KIP-858 for details regarding KRaft and JBOD.

Support for Delegation Tokens in KRaft (KAFKA-15219) was completed in 3.6, further reducing the gap of features between ZooKeeper-based Kafka clusters and KRaft. Migration of delegation tokens from ZooKeeper to KRaft is also included in 3.6.

Tiered Storage is an early access feature. It is currently only suitable for testing in non-production environments. See the Early Access Release Notes for more details.

Read on for more details around what’s new in Apache Kafka.

Comments closed

SQL Server Security Updates

Srinivas Kandibanda announces a series of security updates for all supported versions of SQL Server:

The Security Update for SQL Server 2022 RTM CU8 is now available for download at the Microsoft Download Center and Microsoft Update Catalog sites. This package cumulatively includes all previous SQL Server 2022 fixes through CU8, plus it includes the new security fixes detailed in the KB Article.

I linked specifically to the SQL Server 2022 RTM CU8 blog post, but there are security bulletins for all versions of SQL Server going back to 2014. If you’re running SQL Server 2012 or earlier, no updates for you.

Also, the highest-risk CVE items are in SQL Server 2019 and 2022; for 2017 and below, the one security bulletin covers a moderate-severity denial of service attack.

Comments closed

Reasons Your SQL Server Query Performance Fluctuates

Aaron Bertrand starts the count:

Query performance can fluctuate over time, and it is not necessarily due to a change to the query itself (or to the application code that calls it). Users often ask why a query suddenly got slower even though they haven’t published any changes to the application and the underlying data hasn’t changed drastically. This article points out some other reasons – and there are many – that a query might be slower today than it was 10 minutes ago, two weeks ago, or last summer.

Read on for a bulleted list of reasons. Of course, it would be extremely challenging to create a comprehensive list—for example, in the Same Plan section, in addition to there being more data, changes in the statistical distribution of data can cause performance profiles to change over time. But this is a really good starting point.

Comments closed

Augmenting the Gold Layer in Microsoft Fabric with Semantic Link

Nikola Ilic shows off one use case for Semantic Link:

I won’t spend time explaining what Semantic Link is – you can check a wonderful article written by my friend Sandeep Pawar, or refer to the official blog post. Sandeep’s blog post does a great job explaining not just what Semantic Link is, but also what are the possible use cases of this new feature.

Therefore, I will focus on explaining how you can leverage Semantic Link for a specific use case: I call it “Augmenting Gold Layer” (copyrights reserved). And, we will perform this “operation” by using SQL! Yes, you heard me well – we will leverage SparkSQL language to go above and beyond and “transform” the data currently sitting in Power BI datasets.

I will say that, for obvious reasons, this blog supports the Raw/Refined/Curated naming convention rather than Bronze/Silver/Gold, so I’d posit that this should be called the Augmented Curated Layer.

I can also recommend reading the blog post from Sandeep Pawar. It did a really good job of explaining why Semantic Link is worth getting excited about.

Comments closed

Oracle Errors: Snapshot Too Old and LOB Columns

David Fitzjarrell tackles a pair of errors:

One of the few errors taht strikes fear in the heart of a DBA is the dreaded:

ORA-01555 snapshot too old
and
ORA-22924 snapshot too old

Of course there are plenty of blogs instructing the DBA to simply “increase the undo_retention”, and there are cases where this works as expected. However, LOBs can be different as two different mechanisms exist for undo management. A LOB column can be configured to use retention to manage before mages of the data, but that can be confusing as each LOB column MAY have its own retention setting. The DBA_LOBS view reports whether LOG column uses retention or pctversion to manage undo, and the associated setting being used. Let’s -dig into this a bit deeper.

Read on to learn more about how LOB retention works, the types of issues you can run into with it, and how to correct those issues.

Comments closed

Refreshing a Direct Lake Power BI Dataset in Microsoft Fabric

Chris Webb refreshes our memories:

If you’ve heard about the new Direct Lake mode for Power BI datasets in Fabric you’ll know that it gives you the query performance of Import mode (well, almost) without the need to actually import any data. Direct Lake datasets can be refreshed though – in fact, they refresh automatically by default – and if you look at the dataset’s Refresh History you’ll see there’s a Direct Lake section which sometimes shows errors:

Chris goes on to ask and answer the question, what does it mean to refresh a Direct Lake dataset if you’re not actually importing the data into Power BI?

Comments closed

Cache Management and Semantic Link in Fabric Notebooks

Marc Lelijveld warms up the cache:

In the previous blog, I wrote about data temperature as part of Fabric when you’re using Direct Lake storage mode. In that blog, I explained how you can get insights in the temperature of a column, what that temperature means and what the impact of the temperature is on columns that are queried more often.

In this blog, I will continue this story by elaborating on a process called framing and how you can influence data eviction to drop data from memory. Finally, this blog goes into more details on how you could use Semantic Link in Fabric Notebooks to warm up the data for most optimal end-user performance.

The SQL Server analog here is having some automated queries which keep specific pages in the buffer pool, like a warm-up script for an instance with plenty of memory but slow disks.

Comments closed