Press "Enter" to skip to content

Day: April 3, 2018

Contrasting Plotly And Seaborn

Natasha Sharma contrasts the Seaborn and Plotly libraries for visualizing data:

It was important to use a library which can provide easy and high-class interactivity. Before embedding the plots into my website code, I tested a few different libraries like Matplotlib and Seaborn in order to visualize the results and to see how different they can look. After few trials, I came across Plotly library and found it valuable for my project because of its inbuilt functionality which gives user a high class interactivity.

In this post, I am going to compare Seaborn and Plotly using – Bar Chart and Heatmap diagram. I will be using Breast cancer dataset to visualize these plots. But before jumping into the comparison, the dataset I used needed preprocessing like data cleaning so, I followed steps.

In this case, the contrast is mostly lines of code versus visual quality; read on for more.

Comments closed

Visualizing Geo-Spatial Data In R

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leafletmapviewmapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days.  H/T R-Bloggers

Comments closed

Building Custom Widgets In SQL Operations Studio

Prashanth Jayaram shows how to build a custom widget in SQL Operations Studio, using database growth metrics as an example:

In this article, you will learn the following:

  • Introduction to SQL Operations Studio

  • How to run a custom SQL query and view it as a chart

  • How to use default and custom widgets

  • Various panes and options in SSOS

  • Explain the different chart options

  • Create a custom insight

  • Details to define a custom widget step by step

  • And more…

Building these dashboard widgets is pretty easy, and Prashanth shows it step by step.

Comments closed

Helper Predicates And Multi-Column Filters

Rob Farley has an interesting post on optimizing a lookup when you have separate date and time columns:

Here we see a Seek Predicate that looks for OrderDate values between two values that have been worked out elsewhere in the plan, but creating a range in which the right values must exist. This isn’t >= 20110805 00:00 and < 20110806 00:00 (which is what I would’ve made it), it’s something else. The value for start of this range must be smaller than 20110805 00:00, because it’s >, not >=. All we can really say is that when someone within Microsoft implemented how the QO should respond to this kind of predicate, they gave it enough information to come up with what I call a “helper predicate.”

Now, I would love Microsoft to make more functions sargable, but that particular request was Closed long before they retired Connect.

But maybe what I mean is for them to make more helper predicates.

The problem with helper predicates is that they almost certainly read more rows than you want. But it’s still way better than looking through the whole index.

Read the whole thing.

Comments closed

The Difference Between M And DAX With Cooking

Eugene Meidinger explains the difference between M and DAX as languages using a cooking metaphor:

I like to think of M as this sous chef. It does all the grunt work that we’l like to automate. Let’s say that my boss asks for a utilization report for all of the technicians. What steps am I doing to do in M?

  1. Extract the data from the line of business system
  2. Remove extraneous
  3. Rename columns
  4. Enrich the services table with a Billable / NonBillable column
  5. Generate a date table

This is all important work, but I would have to do the same work for a variety of reports. Many of the steps tell me nothing about the final product. I would generate a date table for most of my reports, for example.

I think the metaphor holds.

Comments closed

Data Cleansing: Hockey Edition

Stacia Varga has a post covering some of the yeoman’s work of data cleansing:

For now, Power BI continues to my tool of choice for my project. My goals for today’s post are two-fold: 1) finish my work to address missing venues in the games table and 2) to investigate the remaining anomalies in the games and scores tables as I noted in my last post.

To recap, I noted the following data values that warranted further investigation :

  • Total Goals minimum of 0 seems odd – because hockey games do not end in ties. I would expect a minimum of 0 so I need to determine why this number is appearing.

  • Total Goals maximum of 29 seems high – it implies that either one team really smoked the opposing team or that both teams scored highly. I’d like to see what those games look like and validate the accuracy.

  • Record Losses minimum of 0 seems odd also – that means at least one team has never had a losing season?

  • Similarly, Record Wins minimum of 0 means one team has never won?

  • Record OT minimum of 0 – I’m not sure how to interpret. I need to look.

  • Score minimum of 0 seems to imply the same thing as Total Goals minimum of 0, which I have already noted seems odd.

This is the kind of stuff that we talk about as taking 80-95% of a data science team’s time.  It’s all about finding “weird” looking values, investigating those values, and determining whether the input data really was correct or if there was an issue.

Comments closed

Tuning Recommendations In SQL Server 2017

Kendra Little shows that even if you don’t want to use automatic tuning in SQL Server 2017, you can still see the tuning recommendations:

Even though automatic tuning wasn’t enabled, SQL Server picked up on the performance changes. I got a recommendation in sys.dm_db_tuning_recommendations.

  • reason: Average query CPU time changed from 2127.84ms to 66291.9ms
  • state: {“currentValue”:”Active”,”reason”:”AutomaticTuningOptionNotEnabled”}

The details also include the query id in question, and the plan_id of the “fast plan”.

It’s nice to check those out for a couple of weeks before turning automatic tuning on; that way, you can get more comfortable with the types of changes the tuning engine recommends, and if you happen to have a system which is terrible for automatic tuning, you can know that before turning the feature on.

Comments closed