Press "Enter" to skip to content

Day: May 15, 2019

Decluttering Visuals

Elizabeth Ricks shows how you can remove extraneous noise from a visual and make it easier for a person to understand what you’re trying to show:

This caught my eye not because of the topic but because of how much time it took me to figure out what information it was trying to convey. What should I do with this? There’s a lot competing for my attention in this chart and distracting me from the data.

Spend a moment examining this graph and take note of which specific elements are challenging. Make a list: what might we eliminate or change to reduce cognitive burden?

I came up with eight specific design changes I would make. How does my list compare with yours?

There is a subjective element to this, but the end result does look better.

Comments closed

Defining TF-IDF

Bruno Stecanella explains the concept behind TF-IDF:

TF-IDF was invented for document search and information retrieval. It works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. So, words that are common in every document, such as this, what, and if, rank low even though they may appear many times, since they don’t mean much to that document in particular.

However, if the word Bug appears many times in a document, while not appearing many times in others, it probably means that it’s very relevant. For example, if what we’re doing is trying to find out which topics some NPS responses belong to, the word Bug would probably end up being tied to the topic Reliability, since most responses containing that word would be about that topic.

This makes the technique useful for natural language processing, especially in classification problems.

Comments closed

Puzzling Through Older Problems

Kenneth Fisher shares a couple of interview puzzles:

The year is 2004. You’re taking a tech test as an interview for a SQL development job. They have a page in their application that displays up to 20 rows of information. They need a piece of code that will return the rows from a given page. Oh, and it may not always be 20 rows per page. You need to write a piece of code where they can pass in a page number and page size and get back results. So for example, if the page size is 20 and the page is 3 then you need to return back rows 41 to 60.

The answers aren’t on the page, but then again, that’s the point of a puzzle.

Comments closed

The SSMS Magic 8 Ball

Bert Wagner has fun with SSMS:

As a kid, I found Magic 8 Balls alluring. There is something appealing about a who-knows-how-many-sides die emerging from the depths of a mysterious inky blue fluid to help answers life’s most difficult questions.

I never ended up buying a magic eight ball of my own though, so today I’m going to build and animate one in SQL Server Management Studio.

Now you can finally answer those important life questions without leaving Management Studio.

Comments closed

RANKX Across Multiple Columns

Jason Baldessari continues a series on RANKX in DAX:

What happens when we need to rank using multiple criteria?  In the example below, we are going to look at resellers by name and key in a SalesTerritoryGroup, and we are going to rank them based on the number of items they have sold.

Let’s start with the model.  I did slightly modify Rob’s original 3 table model approach.  I now have 5 tables here, but the model conceptually still works the same.  I have a sales table, a date table,  a resellers table,  and two lookup tables, one for geography and one for sales territory.

Read the whole thing.

Comments closed

Causing Error 666 When Loading Into Columnstore Index

Joe Obbish has moved into Erik Darling’s Internet Basement and has a doozy of a first post there:

I need to find a relatively efficient way to advance the CSILOCATOR because I need to do it over 2 billion times, if my theory is correct about the maximum allowed value. Both updating all of the rows in a delta rowgroup and deleting and reinserting advance the CSILOCATOR. I expected that small batch sizes would work best, and they did. For my table’s schema, the sweet spot for updates is about 275 rows and the sweet spot for delete/inserts is about 550 rows. Delete/inserts appeared to be faster than updates for the purpose of constantly reloading the same rows over and over.

Great post, Brent!

Comments closed

Embedding Notebooks on a Website

Eduardo Pivaral shows how to embed the results of a Jupyter notebook created in Azure Data Studio on a website:

Notebooks are a functionality available in Azure Data Studio, that allows you to create and share documents that may contain text, code, images, and query results. These documents are helpful to be able to share database insights and create runbooks that you can share easily.

Are you new to notebooks? don’t know what are the uses for it? want to know how to create your first notebook? then you can get started in ADS notebooks checking my article for MSSQLTips.com here.

Once you have created your first notebooks and share them among your team, maybe you want to share it on your website or blog for public access.
even when you can share the file for download, you can also embed it on the HTML code.

Be sure to read the comments too. Rendering notebooks is…an imperfect operation.

Comments closed

Recompile Hints and Query Store

Erin Stellato answers two questions relating to recompile hints and whether those queries show up in Query Store:

Last week in our IEPTO2 class I was asked about queries with OPTION (RECOMPILE) and Query Store. Specifically: Do queries that have the OPTION (RECOMPILE) hint go into Query Store, AND do queries in a stored procedure created with the RECOMPILE option go into Query Store? I knew the answer to the first question, and was pretty sure I know the answer to the second one, but I wanted to test to verify. Let’s take a look.

Erin gives you a tl;dr version but I’m going to ask you to read the whole thing anyhow.

Comments closed