Press "Enter" to skip to content

Author: Kevin Feasel

Remove Chart Clutter

Melissa Yu provides advice on improving your data visualization skills:

Common chart clutter items include:

  • 3-dimensional effects

  • Dark gridlines (use soft gray gridlines or eliminate gridlines when possible)

  • Overuse of bright, bold colors

  • Unnecessary use of all uppercase text (uppercase text is only necessary when calling attention to an element)

Basically, remove every visualization “feature” that Excel 97 gave you…

Comments closed

Power BI Tables Without Data Sources

Chris Webb shows how to create a table in Power BI’s M language without a backing data source:

No data source is needed – this is a way of defining a table value in pure M code. The first parameter of the function takes a list of column names as text values; the second parameter is a list of lists, where each list in the list contains the values on each row in the table.

In the last example the columns in the table were of the data type Any (the ABC123 icon in each column header tells you this), which means that they can contain values of any data type including numbers, text, dates or even other tables. Here’s an example of this

This is a helpful trick.

Comments closed

San Francisco Crime Analysis

Vimal Natarajan shows off some R charts using crime incident data:

By analyzing the plot above, we can arrive at the following insights:

  • The number of crimes steadily decline from midnight and are at the lowest during the early morning hours and then they start increasing and peak around 6 PM in the evening. This is the same insight we arrived in my previous analysis but here we have categorized by the Police district and still see the same pattern.

  • As seen in the previous plot, Park and Richmond districts have the lowest number of crimes throughout the day.

  • As highlighted in red in the plot above, the maximum number of crimes happens in Southern district around 6 PM in the evening.

I would prefer to see code here, but it does serve to give you an idea of what R can do.

Comments closed

Downloading SQL Express 2016

Dave Mason tries out SQL Server Express 2016:

I’m not a fan of the filename “SQLEXPRADV_x64_ENU.exe”. It’s not very descriptive IMO. But if you hover your mouse over the file, there’s a helpful file description tool tip. I’ll probably rename the file anyway.

The download process has changed significantly and I have to admit I’m surprised that I like it so much. I can be set in my ways and averse to change. But once I launched that initial “SQLServer2016-SSEI-Expr.exe” download, everything made sense.

Think back to SQL Server 2012 Express. Remember the “Choose the download you want” dialog? Those file names aren’t very intuitive. I had to Google them every time to make sure I picked the right one. It was slightly better for SQL Server 2014 Express. But still. Yuck!

Sounds like they’ve improved the download experience for Express edition.

Comments closed

Lipwig

Peter Coates shows how to make Hive EXPLAIN plans a lot prettier:

As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of analyze, turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.

The output of EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.

I’m going to have to keep this in mind.

Comments closed

LEN Is For Strings

Kenneth Fisher notes that the LEN function can behave oddly on non-string data types:

Which show you that the FLOAT had to be converted to VARCHAR. You can see the same thing if you try it with various versions of INT or DATE datatypes as well. Like I said earlier. No big deal with INT or even DATE. Those come back in a fairly expected format. (INTs look exactly the same and DATEs come back as ‘YYYY-MM-DD’). FLOAT and REAL however are floating point so they don’t always convert the same way. If you do the conversion deliberately you get this:

Understand your data types; otherwise, it might come back to hurt you later.

Comments closed

EventInternal

Ewald Cress continues to dig into scheduling, this time looking at EventInternal:

signalMode adds a twist. The behaviour described for the traffic light corresponds to a signal mode of 0, also known as a manual reset event. Here the event stays signalled irrespective of how many consumers pass through it (=successfully wait on it).

A signal mode of 1, however, turns it into an auto-reset event, where the act of successfully waiting on the event resets it to unsignalled. This is now more akin to a turnstile that only lets one person through after being signalled, e.g. by a scan of a valid transport pass or a button press by a security guard.

Interestingly, a event object is also sometimes known as a latch – that’s something to chew on for SQL Server folks. Don’t get hung up about who or what signals it; that is a separate issue altogether. Just keep in mind that the signal mode is a permanent attribute of the event – you construct it as manual reset or auto-reset. Full disclosure: there seems to be at least one more SignalMode (2, used by the related SOS_WaitableAddress), but let’s ignore it today.

This is part of a great series, and I hope Ewald keeps it up.  I’d probably drop a few bucks on a cleaned up and edited version of his discussion of internals in an 80-page or so e-book.

Comments closed

Lambda And Kappa

Alex Woodie has a story on two competing data architectures:

Jay Kreps, the co-creator of Apache Kafka and CEO of Confluent, was one of the first big data architects to espouse an alternative to the Lambda architecture, which he did with his 2014 O’Reilly story “Questioning the Lambda Architecture.” While Kreps appreciated some aspects of the Lambda architecture—in particular how it deals with reprocessing data—he stated that the downside was just too great.

“The Lambda architecture says I have to have Hadoop and I have to have Storm and I’m going to implement everything in both places and keep them in sync. “I think that’s extremely hard to do,” Kreps tells Datanami. “I think one of the biggest things hurting stream processing is the amount of complexity that you have to incur to build something. That makes it slow to build applications that way, hard to roll them out, and hard to make them reliable enough to be a key part of the business.

I wonder if we’re seeing the next generation of Kimball v Inmon here, or if one will absolutely dominate.

Comments closed

Flood Visualization

David Smith points out an animated flood chart using R:

As more settlements in Texas and France are impacted by severe flooding, this is a good time to thank the hydrologists at the NOAA who forecast river level rises in advance and give residents in affected areas time to move to higher ground. Along with topgraphic, rainfall, and weather data, monitoring stations maintained by NOAA and the USGS along rivers provide critical real-time information about river levels. NOAA scientists access this data using the dataRetrieval package for R, which they then incorporate into flood prediction models and use to generate animations like this one of the flood of the Delaware in February this year

Looks like I’ve got a new blog to follow…

Comments closed