Press "Enter" to skip to content

Curated SQL Posts

SET, SELECT, and Variable Assignment

Chad Callihan obliquely reminds us to create those unique constraints (by way of unique indexes):

Did you know there is more than one way to set a variable in SQL Server? You can actually set a variable without using “SET” at all. Let’s look at an example that shows how setting a variable with SELECT can cause a headache when dealing with identical values.

Click through to see the problem in action. One way around this if you do know you are dealing with duplicates and need a specific one is to SELECT TOP(1) with an appropriate ORDER BY clause, just as you would if variable assignment weren’t on the table.

Comments closed

Locking Mechanisms in Apache Hive

Shobika Selvaraj documents lock types in Apache Hive and what commands acquire which types:

In the Shared lock there are two types one is Shared_read and Shared_write. Shared_read means anyother shared_read and shared_write query can run at a time. Shared_write lock which means any other shared_read can be performed but no shared_write lock can acquire at that time.

In Exclusive locks no shared_read or shared_write can perform at the same time.

There are three types of lock state:

   (a) Acquired – transaction initiator hold the lock
   (b) Waiting – transaction initiator is waiting for the lock
   (c) Aborted – the lock has timed out but has not yet been cleaned

I was a bit surprised about inserts being shared read but that’s not a typo in the table—Shobika brings receipts.

Comments closed

Stacked Bar Charts

Alex Velez takes us through stacked bar charts:

A few years ago, we posted a question on this blog that is as relevant today as it was years ago: “Is there a good use case for a stacked bar chart?” 

Stacked bars are everywhere; you’ve likely seen them in a recent report, a dashboard, or in the media. Despite their prevalence, they are commonly both misused and misunderstood. In this guide, we’ll aim to rectify these mishaps by sharing examples, clarifying when you should (and shouldn’t) use a stacked bar chart, and discussing best practices for stacking bars. 

Read on for plenty of good advice around when to use stacked (either regular stacked bar charts or 100% stacked), horizontal vs vertical, and how to format them when it does make sense to drop one in.

Comments closed

Date-Time Binning in Cosmos DB

Hasan Savran bins some data:

I wrote about the Date_Bucket() function in SQL Server a couple weeks ago. Azure Cosmos DB team announced the same functionality with a different name DateTimeBin() function. It works exactly the same with the Date_Bucket() function of SQL Server.

     Cosmos DB version of the function has the same number of parameters. The order is different. All the datatime parameters must be in ISO 8601 format (YYYY-MM-DDThh:mm:ss.fffffffZ)

Read on to see how it works.

Comments closed

The Print Operator in KQL

Robert Cain continues a series on KQL:

In this post we’ll cover the print operator. This Kusto operator is primarly used as a development tool, to test calculations.

The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

Importantly, this is an operator and not a statement. This is in contrast to languages like T-SQL.

Comments closed

Power BI as an Enterprise Data Warehouse

James Serra follows Betteridge’s Law of Headlines:

With Power BI continuing to get many great new features, including the latest in Datamarts (see my blog Power BI Datamarts), I’m starting to hear customers ask “Can I just build my entire enterprise data warehouse solution in Power BI”? In other words, can I just use Power BI and all its built-in features instead of using Azure Data Lake Gen2, Azure Data Factory (ADF), Azure Synapse, Databricks, etc? The short answer is “No”.

Read on to understand why Power BI shouldn’t be your data warehouse.

Comments closed

When Parameter Sensitive Plan Optimization Works

Erik Darling ends on a high note:

I’ve used this proc as an example in the past. It’s a great parameter sniffing demo.

Why is it great? Because there’s exactly one value in the Posts table that causes an issue. It causes that issue because someone hated the idea of normalization.

The better thing to do here would be to have separate tables for questions and answers. Because we don’t have those, we end up with a weird scnenario.

Read on for an example of PSP at its best.

Comments closed

Data Lakehouse Cleanrooms in Databricks

Matei Zaharia, et al, announce an interesting idea:

We are excited to announce data cleanrooms for the Lakehouse, allowing businesses to easily collaborate with their customers and partners on any cloud in a privacy-safe way. Participants in the data cleanrooms can share and join their existing data, and run complex workloads in any language – Python, R, SQL, Java, and Scala – on the data while maintaining data privacy.

With the demand for external data greater than ever, organizations are looking for ways to securely exchange their data and consume external data to foster data-driven innovations. Historically, organizations have leveraged data sharing solutions to share data with their partners and relied on mutual trust to preserve data privacy. But the organizations relinquish control over the data once it is shared and have little to no visibility into how data is consumed by their partners across various platforms. This exposes potential data misuse and data privacy breaches. With stringent data privacy regulations, it is imperative for organizations to have control and visibility into how their sensitive data is consumed. As a result, organizations need a secure, controlled and private way to collaborate on data, and this is where data cleanrooms come into the picture.

Read on to learn more about how this all works. It’s definitely a lot better than sending off a bunch of CSVs…

Comments closed

Database Audit Specifications Creating Users

Kenneth Fisher asks, who audits the auditors?:

I love database audits. They are simple, easy to use, effective, not overly resource intensive, and can be turned on and off at need once created. That said, they do have a few gotchas. If you want every user put public as the principal. And if you don’t, and you put in an AD user, be aware that if that user will be created (along with a matching schema) when you create the Database Audit Specification.

Read on for Kenneth’s experience and a way to clean up these potentially-added users.

Comments closed

Finding Key Influencers with Power BI

Gauri Mahajan looks at the key influencers visual in Power BI:

Once the Key Influencers are added to the Power BI report, it would look as shown below. The visual would be empty by default. The key areas that are required to make this visual works are Explain section and Analyze By section. The Analyze section is used to point to the variables or attributes that we intend to analyze. The Explain By section is used to point to the variables or attributes that may be influencing the attributes specified in the Analyze section.

I’ve found this visual to be pretty interesting if you have a good dataset.

Comments closed