Day: May 29, 2018

Taking Screenshots With R

Published 2018-05-29 by Kevin Feasel

Abdul Majed Raja shows us how to take screenshots of webpages using R:

webshot package provides one simple function webshot() that takes a webpage url as its first argument and saves it in the given file name that is its second argument. It is important to note that the filename includes the file extensions like ‘.jpg’, ‘.png’, ‘.pdf’ based on which the output file is rendered. Below is the basic structure of how the function goes:

library(webshot)

#webshot(url, filename.extension)
webshot(“https://www.listendata.com/”, “listendata.png”)

If no folder path is specified along with the filename, the file is downloaded in the current working directory which can be checked with getwd().

Now that we understood the basics of the webshot() function, It is time for us to begin with our cases – starting with downloading/converting a webpage as a PDFcopy.

This isn’t something I’d expect to do every day, but I could see it being useful as part of a notebook to give the user a sanity check, like if a webpage or data set has a last updated timestamp that you want to check. H/T R-Bloggers

Comments closed

Native Scoring With SQL Server 2017 R Services

Published 2018-05-29 by Kevin Feasel

Tomaz Kastrun gives us an example using native scoring in SQL Server 2017 Machine Learning Services:

Native scoring in SQL Server 2017 comes with couple of limitations, but also with a lot of benefits. Limitations are:

currently supports only SQL server 2017 and Windows platform
trained model should not exceed 100 MiB in size
Native scoring with PREDICT function supports only following algorithms from RevoScaleR library:
- rxLinMod (linear model as linear regression)
- rxLogit (logistic regression)
- rxBTrees (Parallel external memory algorithm for Stochastic Gradient Boosted Decision Trees)
- rxDtree (External memory algorithm for Classification and Regression Trees
- rxDForest (External memory algorithm for Classification and Regression Decision Trees)

Read on for an example. If you’re using one of these methods, then native scoring is extremely fast and a bit more flexible than I originally anticipated. The problem is that you have to use one of those methods.

Comments closed

WVPlots 1.0.0

Published 2018-05-29 by Kevin Feasel

John Mount announces WVPlots 1.0.0:

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN!

The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions. This is a package to produce plots while you are in the middle of another task.

I like this idea: I know the kind of plot I need and just want to throw something together for myself to give me an idea of the underlying data.

Comments closed

Row Width And Snapshot Isolation

Published 2018-05-29 by Kevin Feasel

Kendra Little shows us the impact that row width has on snapshot isolation:

So I went to work to demonstrate row width impact on the version store — when only a tiny bit column is changed in the row.

Here’s how I did the test:

I created two tables, dbo.Narrow and dbo.Wide. They each each have a bit column named bitsy, along with some other columns.

I inserted one row in each table, but I put a lot more data into the row in dbo.Wide.

I allowed snapshot isolation on the database

I began a transaction in another session under snapshot isolation and left the transaction open (so version store cleanup wouldn’t kick in while I looked around)

I updated the bit column named bitsy for the single row in each table, thereby generating a row-version in tempdb for each table

The code I ran to test this is here, if you’d like to play around with it.

Read on for the results.

Comments closed

Tic-Tac-Toe In T-SQL

Published 2018-05-29 by Kevin Feasel

Riley Major implements Tic-Tac-Toe in T-SQL:

It turns out there’s a concept called bitmasking which can work a lot like this cardboard cut-out process. (Props to Dylan Beattie for his quick visual demonstration at NDC Minnesota which drove this point home.) First, you represent your game state with a bunch of bits (“OXOOOXXXX” yields “0100011110” for our example above, remembering that we’re padding that last 0 just to make the powers 1-based instead of 0-based) and then you represent your winning state with a bunch of bits (“0000001110” for our example winning state here). Now you use the magic of “bitwise math” to compare the two.

For our use, we want to find out whether our mask exposes the winning three bits. We want to block everything else out. With bits, to check if both items are true, you use “AND” (0 and 0 is 0; 0 and 1 is 0; 1 and 1 is 1). If we apply that “AND” concept to each bit in our game, it will squash out any values which don’t match. If what we have left matches the mask (fills in all of the space we can see through), then we have a match and a win.

The twist in all of this is that the end result doesn’t quite work as expected, but it was interesting watching the process. That said, there’s a good reason why we don’t use T-SQL as a primary language for development…

Comments closed

Why You Should Read Gartner Critical Capabilities Reports

Published 2018-05-29 by Kevin Feasel

Jen Underwood explains the value behind Gartner Critical Capabilities reports, specifically the one for analytics and BI platforms:

Notably, the three Magic Quadrant Leaders except Tableau were ranked near the middle in all use cases. MicroStrategy, Birst, Sisense, TIBCO, YellowFin, Salesforce, SAS and a few other players excelled above the rest with high scores on this report. These results are a bit refreshing to see. Gartner Critical Capabilities scores seem to better align with Forrester’s rankings of Analytics and Business Intelligence Platforms and also my own understanding of several top offerings. I admit that I was surprised by these results. I was rarely – if ever – asked about several of the top scoring vendors over the past three years.

Read the whole thing, and then read the report.

Comments closed

Azure Data Factory V2 Pricing

Published 2018-05-29 by Kevin Feasel

Chris Seferlis gives us the details on how Azure Data Factory V2 pricing works:

2. Volume of data moved – this is measured in DMUs (data movement units). This is one you should be aware of as this will default to auto, which is basically using all the DMUs it can use and this is paid for by the hour. Let’s say you specify and use 2 DMUs and it takes an hour to move that data. The other option is you could use 8 DMUs and it takes 15 minutes, this price is going to end up the same. You’re using 4X the DMUs but it’s happening in a quarter of the time.

This is good to look at and do some comparisons since how many DMUs you’re using is where the bulk of your spend if going to be.

There are a few moving parts here, so the calculation is not trivial. But Chris makes good sense of it all.

Comments closed

Power BI Color Palattes

Published 2018-05-29 by Kevin Feasel

Meagan Longoria helps us choose a color palette for Power BI reports:

A color palette is simply a collection of colors applied to the visual elements in your report. What we typically refer to as color is a combination of three main properties: hue (base color on the color wheel), intensity (brightness or gray-ness) and value (lightness or darkness). You can build an engaging and professional looking report with just 6 colors. It’s possible to have fewer colors or more colors, but 6 should cover many common visualization needs. If you are using more than 6 colors, you might want to check that you are optimizing engagement and cognitive load.

Main color – default color on graphs
Color 2 – used when multiple colors are needed in a graph or report
Color 3 – used when multiple colors are needed in a graph or report and Color 2 has already been used
Highlight color – a color used to highlight important data points to make them stand out from other points on the page
Border color – a light color used for borders on tables and KPIs where necessary
Title color – color used for visual titles and axis labels as appropriate

There’s a lot of good advice in here.

Comments closed

Finding Procedure Parameters Which Don’t Match Column Names

Published 2018-05-29 by Kevin Feasel

Shane O’Neill has a process to update procedures to make input parameter names match output column names:

I was asked to standardise stored procedures we use for common support cases before we hand them over to IT Helpdesk.

One of the comments that came back from the Helpdesk while testing was that the parameter names that they had to put values in for didn’t match what they saw in the application.

Luckily for me (or unluckily) the application was a third party developed one and they didn’t bother renaming the columns. So if the column is called create_date in the database then the application is going to show create_date.

However, if I created a parameter called DateCreated or even CreateDate, then they don’t want it.

Shane has a Powershell script which uses the Find-DbaStoredProcedure method in dbatools; click through to see it in action.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31