Author: Kevin Feasel

Quite often these types of things are not that easy. Yes, it is very tempting to assume that the limited test is good enough and that we did indeed find the exact list of characters (plus we would need to add in the four extra characters: at sign (@), dollar sign ($), number sign (#), and underscore (_)). However, based on my experiences, it seems that more often than not, doing an exhaustive test results in a slightly different answer that invalidates the previous conclusion (which was based on the limited test). So, while it does take more time to do more extensive testing, it seems like we have little choice if we truly want to know how these things actually work.

What that means is, at the very least, we need to get the complete list of characters accepted by SQL Server for non-delimited identifiers to make sure that the totals match the number of code points returned by the searches done in Step 1.

This post is an interesting dive into the oddities of Unicode, but leaves us on a cliffhanger. Also, full-crazy Nicolas Cage beats mullet-wearing Tom Hanks any day.

1 Comment

Contrasting Plotly And Seaborn

Published 2018-04-03 by Kevin Feasel

Natasha Sharma contrasts the Seaborn and Plotly libraries for visualizing data:

It was important to use a library which can provide easy and high-class interactivity. Before embedding the plots into my website code, I tested a few different libraries like Matplotlib and Seaborn in order to visualize the results and to see how different they can look. After few trials, I came across Plotly library and found it valuable for my project because of its inbuilt functionality which gives user a high class interactivity.

In this post, I am going to compare Seaborn and Plotly using – Bar Chart and Heatmap diagram. I will be using Breast cancer dataset to visualize these plots. But before jumping into the comparison, the dataset I used needed preprocessing like data cleaning so, I followed steps.

In this case, the contrast is mostly lines of code versus visual quality; read on for more.

Comments closed

Visualizing Geo-Spatial Data In R

Published 2018-04-03 by Kevin Feasel

Carson Sievert shows off the plotly library:

You might be wondering, “What can plotly offer over other interactive mapping packages such as leaflet, mapview, mapedit, etc?”. One big feature is the linked brushing framework, which works best when linking plotly together with other plotly graphs (i.e., only a subset of brushing features are supported when linking to other crosstalk-compatible htmlwidgets). Another is the ability to leverage the plotly.js API to make efficient updates in shiny apps via plotlyProxy(). Speaking of efficiency, plotly.js keeps on improving the performance of their WebGL-based rendering, so I recommend trying plot_ly() (with toWebGL()) and/or plot_mapbox() if you have lots of graphical elements to render. Also, by having a consistent interface between these various mapping approaches, it’s much quicker and easier to switch from one approach to another when you need to leverage a different set of strengths and weaknesses.

Plotly’s on my list of things I’ll eventually get to one of these days. H/T R-Bloggers

Comments closed

Building Custom Widgets In SQL Operations Studio

Published 2018-04-03 by Kevin Feasel

Prashanth Jayaram shows how to build a custom widget in SQL Operations Studio, using database growth metrics as an example:

In this article, you will learn the following:

Introduction to SQL Operations Studio
How to run a custom SQL query and view it as a chart
How to use default and custom widgets
Various panes and options in SSOS
Explain the different chart options
Create a custom insight
Details to define a custom widget step by step
And more…

Building these dashboard widgets is pretty easy, and Prashanth shows it step by step.

Comments closed

Rebuilding The Master Database In SQL Server On Linux

Published 2018-04-03 by Kevin Feasel

Max Trinidad has a post showing how to fix a busted master database in SQL Server on Linux:

For some reason, I finished doing an “sudo apt update” follow by the “sudo apt upgrade“, and after doing some OS updates “my SQL Server stopped working”. The following message was logged in the log event: (see Image)

Click through to see how to force a rebuild of master.

Comments closed

Helper Predicates And Multi-Column Filters

Published 2018-04-03 by Kevin Feasel

Rob Farley has an interesting post on optimizing a lookup when you have separate date and time columns:

Here we see a Seek Predicate that looks for OrderDate values between two values that have been worked out elsewhere in the plan, but creating a range in which the right values must exist. This isn’t >= 20110805 00:00 and < 20110806 00:00 (which is what I would’ve made it), it’s something else. The value for start of this range must be smaller than 20110805 00:00, because it’s >, not >=. All we can really say is that when someone within Microsoft implemented how the QO should respond to this kind of predicate, they gave it enough information to come up with what I call a “helper predicate.”

Now, I would love Microsoft to make more functions sargable, but that particular request was Closed long before they retired Connect.

But maybe what I mean is for them to make more helper predicates.

The problem with helper predicates is that they almost certainly read more rows than you want. But it’s still way better than looking through the whole index.

Read the whole thing.

Comments closed

The Difference Between M And DAX With Cooking

Published 2018-04-03 by Kevin Feasel

Eugene Meidinger explains the difference between M and DAX as languages using a cooking metaphor:

I like to think of M as this sous chef. It does all the grunt work that we’l like to automate. Let’s say that my boss asks for a utilization report for all of the technicians. What steps am I doing to do in M?

Extract the data from the line of business system

Remove extraneous

Rename columns

Enrich the services table with a Billable / NonBillable column

Generate a date table

This is all important work, but I would have to do the same work for a variety of reports. Many of the steps tell me nothing about the final product. I would generate a date table for most of my reports, for example.

I think the metaphor holds.

Comments closed

Data Cleansing: Hockey Edition

Published 2018-04-03 by Kevin Feasel

Stacia Varga has a post covering some of the yeoman’s work of data cleansing:

For now, Power BI continues to my tool of choice for my project. My goals for today’s post are two-fold: 1) finish my work to address missing venues in the games table and 2) to investigate the remaining anomalies in the games and scores tables as I noted in my last post.

To recap, I noted the following data values that warranted further investigation :

Total Goals minimum of 0 seems odd – because hockey games do not end in ties. I would expect a minimum of 0 so I need to determine why this number is appearing.
Total Goals maximum of 29 seems high – it implies that either one team really smoked the opposing team or that both teams scored highly. I’d like to see what those games look like and validate the accuracy.
Record Losses minimum of 0 seems odd also – that means at least one team has never had a losing season?
Similarly, Record Wins minimum of 0 means one team has never won?
Record OT minimum of 0 – I’m not sure how to interpret. I need to look.
Score minimum of 0 seems to imply the same thing as Total Goals minimum of 0, which I have already noted seems odd.

This is the kind of stuff that we talk about as taking 80-95% of a data science team’s time. It’s all about finding “weird” looking values, investigating those values, and determining whether the input data really was correct or if there was an issue.

Comments closed

Tuning Recommendations In SQL Server 2017

Published 2018-04-03 by Kevin Feasel

Kendra Little shows that even if you don’t want to use automatic tuning in SQL Server 2017, you can still see the tuning recommendations:

Even though automatic tuning wasn’t enabled, SQL Server picked up on the performance changes. I got a recommendation in sys.dm_db_tuning_recommendations.

reason: Average query CPU time changed from 2127.84ms to 66291.9ms

state: {“currentValue”:”Active”,”reason”:”AutomaticTuningOptionNotEnabled”}

The details also include the query id in question, and the plan_id of the “fast plan”.

It’s nice to check those out for a couple of weeks before turning automatic tuning on; that way, you can get more comfortable with the types of changes the tuning engine recommends, and if you happen to have a system which is terrible for automatic tuning, you can know that before turning the feature on.

Comments closed

Using Python Within R

Published 2018-04-02 by Kevin Feasel

David Smith points out new reticulate package:

With reticulate, you can:

Import objects from Python, automatically converted into their equivalent R types. (For example, Pandas data frames become R data.frame objects, and NumPy arrays become R matrix objects.)
Import Python modules, and call their functions from R
Source Python scripts from R
Interactively run Python commands from the R command line
Combine R code and Python code (and output) in R Markdown documents, as shown in the snippet below

The first thing that came to mind when reading this was the implementation of the keras package in R and how it calls out to TensorFlow (written in Python). The ability to make R vs Python an “and” instead of an “or” proposition is quite powerful.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31