Press "Enter" to skip to content

Curated SQL Posts

Color and Emotion

Cedric Scherer explains some of the psychology behind understanding of color in visuals:

Without any intention, the two variations of my visualization triggered different emotional reactions. While the red chart likely leads you to think “Wow, Berlin summers are quite hot,” the blue version may push you to think of summers as rainy and rather cold.

In general, we should have in mind that different details might spark different emotions and expectations in our viewers. Some of these details will make it easier for them to understand the chart in the manner the designer intended. 

Having experienced (parts of) a Berlin summer, let me confirm that they are not hot when compared to the midwestern or southeastern US.

Comments closed

Caching in R

Bernardo Lares takes us through the lares library’s caching functionality:

If you’ve never heard of cache (/kaSH/) before, Google it and you’ll quickly find that it is “a collection of items of the same type stored in a hidden or inaccessible place”. Basically, you have “something” stored “somewhere” so you can fetch it “sometime” later. If it sounds basic, it (can be) is! This simple technic can come quite handy when you are coding functions that take some time to gather and/or process the data you’re working with. In other words, think of those processes that take some time to run and there’s really no need to re-run it “every time” because the outcome will be exactly the same. Also, you are unnecessarily spending time, computer power, and real energy when you re-process cache-able stuff.

Today I’ll show you how I use cache in R to accelerate results, avoid re-processing, and improve UX for my users using the lareslares library. Let’s see a couple of functions that actually leverage cache usage and how can you start using them.

Read on for a walkthrough of the process.

Comments closed

Setting up a Full-Text Index in SQL Server

Steve Jones walks us through setup for a new full-text index in SQL Server:

A full text index allows you to search a little more freely than standard T-SQL with a LIKE or wildcards. It’s useful for going through large amounts of text, mainly hundreds or thousands of words.

To get started, you need to know a few things. First, this system in modern SQL Server (2008+) is set up on all instances. You don’t enabled FTS like you would for In-Memory OLTP tables or FILESTREAM.

Next, you need a catalog for the FTS indexes, which is a logical container.

Next, a table with data.

Finally, you create the index. In this post, I’ll look at SSMS and the GUI. In another one, I’ll look at the T-SQL itself.

With all that in mind, click through to read Steve’s post and set up your own full-text search process.

Comments closed

Maximizing Availability Group Performance

Jonathan Kehayias has a few tips for improving performance of your Availability Groups:

Since Microsoft first introduced the Always On Availability Groups (AGs) feature in SQL Server 2012, there’s been a lot of interest in using AGs for both high availability and disaster recovery (HADR), as well as for offloading read-only workloads. The combination of the best features for failover clustering, the simplicity of data movement and synchronization from database mirroring, and the ability to offload read-only workloads to secondaries has given businesses a compelling reason to upgrade to leverage AGs.

But, as the saying goes, there’s no such thing as a free lunch, and there are several performance implications and considerations you must be aware of to have a successful deployment using AGs. This blog post will explore some of the considerations and look at how to plan, architect, and implement an AG with minimal latency and performance impact on the production workload.

Click through for those tips.

Comments closed

Limitations in Power BI Aggregations

Teo Lachev looks at a couple of limitations in Power BI aggregations, as well as workarounds for those limitations:

Power BI aggregations are meant to speed up queries to large DirectQuery tables, as a DBA would create summarized tables to speed up queries to large tables. The most appealing aspect of telling Power BI about these aggregations is that Power BI will automatically redirect the query to the aggregation cache if it determines that its dimensionality matches the dimensionality of the aggregated table, as explained in the documentation. However, there are a couple of limitations worth emphasizing that will prevent this from happening:

Click through for those limitations and what Teo & co did to move forward despite them.

Comments closed

Clarifying Confusion around Power BI Goals

Treb Gatte continues a series on Power BI Goals:

Power BI Goals enables you to present the status of a key outcome that can optionally be tied to data. Treating Power BI Goals as a glorified hierarchy of metrics may lead you to miss a more valuable use value of Goals.

Note, Goals do not roll up. The hierarchy is there to provide a context for the goal and subordinate goals. If you need data rollup, you may want to look at alternatives.

Part 4 of our blog series covers the ability to support OKRs (Objectives and Key Results) with Power BI Goals. OKRs are a very powerful mechanism for remote workers to stay in sync and focused on the most important work.

Read the whole thing.

Comments closed

Creating a dacpac for a Dedicated SQL Pool

Kevin Chant shows how to use Azure DevOps to create a dacpac for an Azure Synapse Analytics dedicated SQL pool:

By the end of this post, you will know how to create a dacpac for a dedicated SQL Pool within Azure Pipelines for your CI/CD deployments. Plus, how you can synchronize a database project created in Azure Data Studio with a Git repository in Azure DevOps.

In a previous post I covered how you can create a dacpac for an Azure Synapse Analytics dedicated SQL Pool using Azure Data Studio. In that post I stated that you could create a dacpac for the database project using Azure DevOps.

With this in mind, I will use the same database project that I created in that post.

Click through for the process.

Comments closed

The End of the NFT Bubble(?)

Stephanie Glen has music to my ears:

Non-fungible tokens (NFTs), tradable digital certificates that verify ownership of digital assets using blockchain technology, have dominated headlines in the last several months. The media mania hit a high with the $69 million sale of Beeple’s Everydays:The First 5000 Days. A few months after Beeple’s historic sale at Christie’s auction house, the crypo-art bubble has officially burst.

These sorts of things are a bit too volatile for me to cheer just yet. The blockchain bubble is something I look at and say, this is incredibly dumb. The whole premise of it makes zero sense: you’re wasting resources (and don’t get me started on Chia, the grim reaper for residential SSDs) for nothing. The end product has extremely little to no subjective value—how much would you pay for blockchain outputs?—but burns up resources in the form of energy, increased prices for computer components, and time that could have been spent doing something more productive, like repeatedly turning your computer off and on again: at least there, you gain valuable skills in figuring out how to power down and power up a machine.

I can kinda-sorta get the idea of using blockchain for certain types of auditing trails, but there are still two big problems with it. First is the 50% problem: whoever controls 50% of the compute controls the past, present, and future of the blockchain and can make whatever arbitrary changes are desired. Beyond that, the other problem is, how much better is this than a digest hash of activities written to a WORM drive? Considering how many orders of magnitude less expensive the latter is to the former, there has to be an enormous benefit for it to make any sense. And there’s really not.

Comments closed

The Value of a Working Dev Environment

Tim Mitchell wants to talk about dev environments:

Let’s talk about your development environment.

Specifically, I’d like to chat with you about the virtual space where your data architecture team, software developers, and information curators do their development and testing work. A proper development environment is logically separated from the production environment, and is often further partitioned into different realms for initial development, data or functional validation, and user acceptance testing. For mature enterprise-ready environments, there is also usually a build and deployment process that automates the movement of code from one environment to the next, reducing the chance for human error when moving code through its paces and ultimately into the production environment.

I’d like optimistically to say that Tim is using strawmen here, but I’ve worked in (and sometimes created) pretty much each one of these.

Comments closed