Finding High-Cardinality Columns

Constantine Kokkinos shows how you can find the cardinality of each column on a SQL table:

Today I was diving into some extremely wide tables, I wanted to take a quick look at things like “How many unique values does this table have in every column?”.

This can be super useful if you have a spreadsheet of results or a schema without effective normalization and you want to determine which rows are the “most unique” – or have high cardinality.

The Github gist is embedded at the bottom of the page, but I will run you through the code in case you want an explanation of how it works

Click through for the script.

Related Posts

Multi-Pattern Replacement with SQL Server

Hugo Kornelis has a pattern matching problem to solve: The actual use case and the list of patterns that I had to remove are considered a confidential competitive advantage by my client, so I will just make up a list here to illustrate the problem. In this fake requirement, the following patterns must all be […]

Read More

Forensic Accounting: Cohort Analysis

I continue my series on forensic accounting techniques with cohort analysis: In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

April 2019
MTWTFSS
« Mar  
1234567
891011121314
15161718192021
22232425262728
2930