Press "Enter" to skip to content

Curated SQL Posts

What’s New in data_algebra

John Mount has an update on a Python package:

The data algebra is a modern realization of elements of Codd’s 1969 relational model for data wrangling (see also Codd’s 12 rules).

The idea is: most data manipulation tasks can usefully be broken down into a small number of fundamental data transforms plus composition. In Codd’s initial writeup, composition was expressed using standard mathematical operator notation. For “modern” realizations one wants to use a composition notation that is natural for the language you are working in. For Python the natural composition notation is method dispatch.

Click through to see how it works and what’s new in the latest version.

Comments closed

Visio Licensing Changes and Power BI

Chris Webb ties a new Visio announcement to Power BI:

There was an interesting announcement today regarding Visio:

https://www.microsoft.com/en-us/microsoft-365/blog/2021/06/09/bringing-visio-to-microsoft-365-diagramming-for-everyone/

In summary there will soon be a lightweight, web-based version of Visio available to anyone with a Microsoft 365 Business, Office 365 E1/E3/E5, F3, A1, A3 or A5 subscription. Previously Visio was not part of the main M365 plans and was only available as a separate purchase.

So what? As a Power BI user, why should I care? 

Read on for Chris’s answer. If the web-based version of Visio is good, I’m reasonably excited by this prospect.

Comments closed

A Checklist for Database Post-Restoration

Randolph West wants you to keep some things in mind after you restore that database:

Whenever I restore a database — especially one I obtained outside of my regular environment (for example a customer database, a development database, or even a sample database like WideWorldImporters) — there are a few things I like to check to make sure it’s configured for peak performance.

Note that some of this advice may apply only to non-production databases.

Click through for the list.

Comments closed

Defining the Data Fabric

James Serra explains a concept:

Another buzzword that you may have been hearing a lot about lately is Data Fabric. In short, a data fabric is a single environment consisting of a unified architecture with services and technologies running on it that architecture that helps a company manage their data. It enables accessing, ingesting, integrating, and sharing data in a environment where the data can be batched or streamed and be in the cloud or on-prem. The ultimate goal of data fabric is to use all your data to gain better insights into your company and make better business decisions.  If you are thinking this sounds a lot like a modern data warehouse that I posted a video on recently at Modern Data Warehouse explained, well, I would argue it basically is the same thing except a data fabric expands on that architecture.

Read on for James’s explanation.

Comments closed

CROSS and OUTER APPLY

Kenneth Fisher takes us through CROSS versus OUTER APPLY:

I love CROSS APPLY. I also love OUTER APPLY. What’s the difference though? The same difference as with an INNER vs OUTER JOINs. The CROSS APPLY only includes rows where there is a match, while OUTER APPLY includes all rows even if there isn’t a match. I’ve found over time that I have a lot easier time using an example for this rather than trying to explain in any detail. I’m going to use STRING_SPLIT for my example because it’s easy.

Click through for the example.

Comments closed

Workload Management in Azure Synapse Analytics

Reiss McSporran starts a new series on workload management in Azure Synapse Analytics:

So you’ve started using Synapse SQL Pools, you’ve scaled up to improve performance, but your queries aren’t going as fast as you’d hoped. You take a look at resource utilisation and you see that as you scale up, your total resource utilisation per query goes down. But didn’t we scale up to allow our query to use more resource and run quicker? What’s going on here? A little thing called Workload Management.

What is Workload Management? At a very high level, Workload Management allows you to control how your workload uses system resources. The main problem here is defaults!

Click through for the full story. Or at least the full part 1 of the even fuller story.

Comments closed

Query Store Hints

Erin Stellato takes a look at a new Microsoft announcement:

Query hints can be really helpful in some scenarios, but if you do not have direct access to the code (hello third party applications), then you cannot add a query hint to the statement.  Historically, you’ve been able to use a plan guide to apply a hint, and that doesn’t require direct code access, but plan guides are not always intuitive and I find that most people are not familiar with how to use them.

Enter Query Store Hints.  With this feature, you can now add a hint to a specific statement, even if you don’t have access to the code base.  

Read on to see what it does and how you can use it.

Comments closed

Color and Emotion

Cedric Scherer explains some of the psychology behind understanding of color in visuals:

Without any intention, the two variations of my visualization triggered different emotional reactions. While the red chart likely leads you to think “Wow, Berlin summers are quite hot,” the blue version may push you to think of summers as rainy and rather cold.

In general, we should have in mind that different details might spark different emotions and expectations in our viewers. Some of these details will make it easier for them to understand the chart in the manner the designer intended. 

Having experienced (parts of) a Berlin summer, let me confirm that they are not hot when compared to the midwestern or southeastern US.

Comments closed

Caching in R

Bernardo Lares takes us through the lares library’s caching functionality:

If you’ve never heard of cache (/kaSH/) before, Google it and you’ll quickly find that it is “a collection of items of the same type stored in a hidden or inaccessible place”. Basically, you have “something” stored “somewhere” so you can fetch it “sometime” later. If it sounds basic, it (can be) is! This simple technic can come quite handy when you are coding functions that take some time to gather and/or process the data you’re working with. In other words, think of those processes that take some time to run and there’s really no need to re-run it “every time” because the outcome will be exactly the same. Also, you are unnecessarily spending time, computer power, and real energy when you re-process cache-able stuff.

Today I’ll show you how I use cache in R to accelerate results, avoid re-processing, and improve UX for my users using the lareslares library. Let’s see a couple of functions that actually leverage cache usage and how can you start using them.

Read on for a walkthrough of the process.

Comments closed

Setting up a Full-Text Index in SQL Server

Steve Jones walks us through setup for a new full-text index in SQL Server:

A full text index allows you to search a little more freely than standard T-SQL with a LIKE or wildcards. It’s useful for going through large amounts of text, mainly hundreds or thousands of words.

To get started, you need to know a few things. First, this system in modern SQL Server (2008+) is set up on all instances. You don’t enabled FTS like you would for In-Memory OLTP tables or FILESTREAM.

Next, you need a catalog for the FTS indexes, which is a logical container.

Next, a table with data.

Finally, you create the index. In this post, I’ll look at SSMS and the GUI. In another one, I’ll look at the T-SQL itself.

With all that in mind, click through to read Steve’s post and set up your own full-text search process.

Comments closed