Press "Enter" to skip to content

Curated SQL Posts

When The Power BI Work Is Done

Melissa Coates has a great checklist to help you figure out if your Power BI dashboard is done:

Auto time intelligence is enabled by default, and it applies to each individual PBIX file (there’s not a global option). For most datetime columns that exist in the dataset, a hidden date table is created in the model to support time-oriented DAX calculations. This is great functionality for newer users, or if you have a very simple data model. However, if you typically utilize a standard Date table, then you will want to disable the hidden date tables to reduce the file size. (Tip: You can view the hidden date tables if you connect to the PBIX via DAX Studio.)

There are a lot of good things to think about here.

Comments closed

Using The Squint Test

Meagan Longoria gives us the squint test:

While you can definitely perform the Squint Test on your report within Power BI Desktop, I recommend also testing in a browser once the report is deployed to PowerBI.com or to the Power BI Report Server portal since colors and objects may be slightly different there.

The Squint Test is also used in web page design, so web developers have made tools to aid them in this check. While just squinting at the page is perfectly sufficient, using a browser extension or another tool allows you to easily share your findings with others. In the Chrome Browser, there is a free extension called The Squint Test. This extension places an eye icon near the top right of the browser window. Clicking the icon provides a slider that allows you to increase or decrease the amount of blur applied to the page.

Meagan also has an example of applying this test and picks a dashboard where she can make some improvements, so check it out.

Comments closed

The ggplot2 Books

Hadley Wickham has a couple of books which teach a lot about ggplot2.  The first book I’d recommend is his and Garrett Grolemund’s R For Data Science book, which is available for free online:

To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside aes(). ggplot2 will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot2 will also add a legend that explains which levels correspond to which values.

The colors reveal that many of the unusual points are two-seater cars. These cars don’t seem like hybrids, and are, in fact, sports cars! Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage. In hindsight, these cars were unlikely to be hybrids since they have large engines.

Wickham also has the source to build his ggplot2 book online.  If you don’t want to build the source, you also have the option of buying the book.

Comments closed

A Layered Grammar Of Graphics

Hadley Wickham describes some of the decisions he made when putting together ggplot2:

In the examples above, we have seen some of the components that make up a plot:
• data and aesthetic mappings,
• geometric objects,
• scales, and
• facet specification.
We have also touched on two other components:
• statistical transformations, and
• the coordinate system.
Together, the data, mappings, statistical transformation, and geometric object form a layer. A plot may have multiple layers, for example, when we overlay a scatterplot with a smoothed line.

This isn’t an article about how to use ggplot2; rather, it’s an article about implementation decisions.  To that end, I think it’s useful to see some of the logic behind ggplot2’s decisions.

Comments closed

The Grammar Of Graphics

Leland Wilkinson has written the book on how we should write systems which visualize data:

This book was written for statisticians, computer scientists, geographers, research and applied scientists, and others interested in visualizing data. It presents a unique foundation for producing almost every quantitative graphic found in scientific journals, newspapers, statistical packages, and data visualization systems. This foundation was designed for a distributed computing environment (Internet, Intranet, client-server), with special attention given to conserving computer code and system resources.

There’s no free copy of this book, and it’s a very expensive textbook. For most people, you’ll get more from derivative works, but if you’ve thought about putting together a graphics library, this is a must-read.

Comments closed

Data Visualization For Social Science

I’ve started reading Kieran Healy’s book, Data Visualization For Social Science.  He has a free draft available online, and it automatically builds nightly so you’re seeing the latest version.  From the preface:

This book is a hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot. R is a powerful, widely used, and freely available programming language for data analysis. You may be interested in exploring ggplot after having used R before, or be entirely new to both R and ggplot and just want to graph your data. I do not assume you have any prior knowledge of R.

After installing the software we need, we begin with an overview of some basic principles of visualization. We focus not just on the aesthetic aspects of good plots, but on how their effectiveness is rooted in the way we perceive properties like length, absolute and relative size, orientation, shape, and color. We then learn how to produce and refine plots using ggplot2, a powerful, versatile, and widely-used visualization library for R (Wickham 2016a). The ggplot2 library implements a “grammar of graphics” (Wilkinson 2005). This approach gives us a coherent way to produce visualizations by expressing relationships between the attributes of data and their graphical representation.

Through a series of worked examples, you will learn how to build plots piece by piece, beginning with scatterplots and summaries of single variables, then moving on to more complex graphics. Topics covered include plotting continuous and categorical variables, layering information on graphics; faceting grouped data to produce effective “small multiple” plots; transforming data to easily produce visual summaries on the graph such as trend lines, linear fits, error ranges, and boxplots; creating maps, and also some alternatives to maps worth considering when presenting country- or state-level data. We will also cover cases where we are not working directly with a dataset, but rather with estimates from a statistical model. From there, we will explore the process of refining plots to accomplish common tasks such as highlighting key features of the data, labeling particular items of interest, annotating plots, and changing their overall appearance. Finally we will examine some strategies for presenting graphical results in different formats, and to different sorts of audiences.

I’m less than halfway through the book so far, but it is quite an approachable look at the ggplot2 library with a bit of discussion on what makes for quality graphics.

Comments closed

Fill And Highlight Countries On A Map

The folks at Sharp Sight Labs show how to create a professional-looking filled map in R, including highlighting specific countries:

Let’s point out a few things.

First, the fill color scale has been carefully crafted to optimally show differences between countries.

Second, we are simultaneously using the highlighting technique to highlight the OPEC countries.

Finally, notice that we’re using the title to “tell a story” about the highlighted data.

All told, there is a lot going on in this example.

It’s a very interesting example of building higher-quality visuals in R.  I also give them kudos for picking five colors which work for people with every kind of Color Vision Deficiency.  H/T R-Bloggers

Comments closed

Customer Retention Analysis With SQL

Luba Belokon walks through some sample customer retention analysis queries written in SQL:

Customer retention curves are essential to any business looking to understand its clients and will go a long way towards explaining other things like sales figures or the impact of marketing initiatives. They are an easy way to visualize a key interaction between customers and the business, which is to say, whether or not customers return — and at what rate — after the first visit.

The first step to building a customer retention curve is to identify those who visited your business during the reference period, what I will call p1. It is important that the length of the period chosen is a reasonable one, and reflects the expected frequency of visits.

Different types of businesses are going to expect their customers to return at different rates:

  • A coffee shop may choose to use an expected frequency of visits of once a week.

  • A supermarket may choose a longer period, perhaps two weeks or one month.

In this case, I think the motivation portion is better than the queries themselves, but the article definitely works as an inspiration for building out good measures of frequency of occurrence.

Comments closed

What’s New With KSQL

Hojjat Jafarpour announces Kafka’s KSQL version 0.3:

Additionally, we have taken the first steps to provide metrics and observability in KSQL. This greatly enhances the operability of KSQL, like in cases where you’re monitoring KSQL capacity or when diagnosing issues. You can now see different metrics for streams, tables, and queries for every KSQL server instance.

For streams and tables, we now have DESCRIBE EXTENDED <stream/table name> statement to show statistics, such as number of messages processed per second, total messages, the time when the last message was received, as well as corresponding failure metrics.

Looks like they’re building it out a piece at a time.

Comments closed