Press "Enter" to skip to content

Curated SQL Posts

Augmenting the Gold Layer in Microsoft Fabric with Semantic Link

Nikola Ilic shows off one use case for Semantic Link:

I won’t spend time explaining what Semantic Link is – you can check a wonderful article written by my friend Sandeep Pawar, or refer to the official blog post. Sandeep’s blog post does a great job explaining not just what Semantic Link is, but also what are the possible use cases of this new feature.

Therefore, I will focus on explaining how you can leverage Semantic Link for a specific use case: I call it “Augmenting Gold Layer” (copyrights reserved). And, we will perform this “operation” by using SQL! Yes, you heard me well – we will leverage SparkSQL language to go above and beyond and “transform” the data currently sitting in Power BI datasets.

I will say that, for obvious reasons, this blog supports the Raw/Refined/Curated naming convention rather than Bronze/Silver/Gold, so I’d posit that this should be called the Augmented Curated Layer.

I can also recommend reading the blog post from Sandeep Pawar. It did a really good job of explaining why Semantic Link is worth getting excited about.

Comments closed

Oracle Errors: Snapshot Too Old and LOB Columns

David Fitzjarrell tackles a pair of errors:

One of the few errors taht strikes fear in the heart of a DBA is the dreaded:

ORA-01555 snapshot too old
and
ORA-22924 snapshot too old

Of course there are plenty of blogs instructing the DBA to simply “increase the undo_retention”, and there are cases where this works as expected. However, LOBs can be different as two different mechanisms exist for undo management. A LOB column can be configured to use retention to manage before mages of the data, but that can be confusing as each LOB column MAY have its own retention setting. The DBA_LOBS view reports whether LOG column uses retention or pctversion to manage undo, and the associated setting being used. Let’s -dig into this a bit deeper.

Read on to learn more about how LOB retention works, the types of issues you can run into with it, and how to correct those issues.

Comments closed

Refreshing a Direct Lake Power BI Dataset in Microsoft Fabric

Chris Webb refreshes our memories:

If you’ve heard about the new Direct Lake mode for Power BI datasets in Fabric you’ll know that it gives you the query performance of Import mode (well, almost) without the need to actually import any data. Direct Lake datasets can be refreshed though – in fact, they refresh automatically by default – and if you look at the dataset’s Refresh History you’ll see there’s a Direct Lake section which sometimes shows errors:

Chris goes on to ask and answer the question, what does it mean to refresh a Direct Lake dataset if you’re not actually importing the data into Power BI?

Comments closed

Cache Management and Semantic Link in Fabric Notebooks

Marc Lelijveld warms up the cache:

In the previous blog, I wrote about data temperature as part of Fabric when you’re using Direct Lake storage mode. In that blog, I explained how you can get insights in the temperature of a column, what that temperature means and what the impact of the temperature is on columns that are queried more often.

In this blog, I will continue this story by elaborating on a process called framing and how you can influence data eviction to drop data from memory. Finally, this blog goes into more details on how you could use Semantic Link in Fabric Notebooks to warm up the data for most optimal end-user performance.

The SQL Server analog here is having some automated queries which keep specific pages in the buffer pool, like a warm-up script for an instance with plenty of memory but slow disks.

Comments closed

Changing the Style of a Legend in R

Steven Sanderson is a legend:

Before diving into code examples, let’s understand the basics. In R, legends are essential for explaining the meaning of different elements in your plot, such as colors, lines, or shapes. Legends help your audience interpret the data effectively.

In most cases, R’s base plotting system provides you with control over the legend’s size. The key functions we’ll explore are legend() and guides(). We’ll also delve into how to modify legend size in popular plotting packages like ggplot2.

Click through for those demonstrations.

Comments closed

A Simple Chart Gone Wrong

Mike Cisneros debugs a visual:

Looking at the chart, you might think, “For such a simple chart, why am I so confused?” It’s a bar chart with one data series, a title, a subtitle, and more info at the bottom. But the real challenge isn’t the design, layout, or labeling. It’s the assumptions made before creating the chart. For example, the chart uses the acronym “ADS” without explaining it. Does everyone know it means Average Daily Sales? And does “Gap Analysis” make sense to coffee shop owners?

One additional thing that I would point out is that column charts tend to imply time series, which adds even more to the confusion, as the presentation makes it look like you’re seeing the comparison of Mellow Bean versus its competition over seven time periods. Bar charts, meanwhile, tend to imply static data, so the move to a bar chart makes sense.

Comments closed

Two Takes on First Normal Form

Joe Celko defends the honor of First Normal Form:

You do not need a complete understanding of regular expressions or ICD codes to follow this article, so don’t worry too much about it. The reason for posting the simplified regular expression was to scare you. My point was that this regular expression would be a pretty impressive CHECK constraint on this column. Shall we be honest? Despite the fact that we know the best programming practice is to detect an error as soon as possible, do you believe that the original poster wrote such a constraint for the concatenated list of ICD codes?

I’m willing to bet that any such validation is being done in an input tier by some poor lonely program, in an application language. Even more likely, it’s not being done at all.

First Normal Form (1NF) says that this concatenated string is a repeated group, and we need to replace it with a proper relational construct.

In the meantime, I’ve continued my series on database normalization and call First Normal Form overrated:

In this video, we start at the ground floor with 1st Normal Form. We’ll learn what people think it is, what it really is, and why it’s not as great as it’s cracked up to be.

I agree with Joe that his ICD-10 code example is a bad database design. The area in which I don’t agree—and for this, I’m leaning heavily on C.J. Date—is that repeating groups actually violate 1NF. My video covers this in a bit more detail and I also include a quotation from Date’s recent book on database design talking about how 1NF has nothing to do with repeating groups or atomicity, and that 1NF could even include relvars inside of relvars (an example Joe shows 1NF preventing).

Comments closed

Hybrid Failover Rights from SQL Server 2022 to Azure SQL MI

Dani Ljepava explains a new benefit:

Hybrid failover rights is a new benefit that allows you to run a license-free Azure SQL Managed Instance when used as a passive DR replica for your SQL Server 2022 licensed under Software Assurance (SA), or using Pay-as-you-go billing option.

How the Hybrid Failover Rights benefit works

The new Hybrid failover rights licensing benefit is technology agnostic. You can use any technology, such is MI link as the most advanced replication technology using Always On, or perhaps LRS, ADF, transactional replication, backup and restore, or similar to setup replication between SQL Server and Managed Instance. As long as you are using Azure SQL Managed Instance only as a passive replica for your SQL Server 2022, you are eligible to apply the new licensing benefit.

Read on for more details on how you can activate this benefit.

Comments closed

Against Transparent Data Encryption in SQL Server

Andy Yun is not a fan:

Of all of the various data protection options available to us in SQL Server, I argue that Transparent Data Encryption (aka TDE) is worthless Security Theater.

TDE encrypts your data files “at rest.” This means one cannot crack open a hex editor and just start reading your raw data. And did you know that you can use a hex editor and start reading string values right out of your MDF files? Go search on that – that’s a separate tangent you can go explore.

Read on to understand the ways in which Andy finds fault with TDE.

Comments closed