Using Focus() On Correlations

Kevin Feasel

2016-08-11

R

Simon Jackson explains how to use the focus() function in R to narrow down a data frame of correlation coefficients based on a subset of variables:

focus() works similarly to select() from the dplyr package (which is loaded along with the corrr package). You add the names of the columns you wish to keep in your correlation data frame. Extending select(), focus()will then remove the remaining column variables from the rows. This is whympg does not appear in the rows above. Here’s another example with two variables:

Click through for the entire article.

Power BI DirectQuery

Adam Aspin explains some of the concepts behind DirectQuery:

In this article I will not be looking at Power BI as a data visualization tool. Indeed, I will not be discussing in anything more than a cursory way the data transformation and modelling capabilities of this product. Instead, I would like to focus on using Power BI Desktop to connect directly to certain data sources. So although you need no previous knowledge of Power BI Desktop to understand this article, I will be providing only an extremely succinct glance at the extensive array of the data mashup and modelling techniques that are available. Nor will I be explaining any of the approaches that can be taken to create visuals and dashboards. I will, however, try out a few classic techniques using the Power BI Query Editor and Data Model to see if there are any limits to using DirectQuery when compared to data loaded into the in-memory model. The aim here is to see just how, and in what circumstances, DirectQuery can help you to analyze your data faster and more easily.

This article is lengthy but good.  Check it out.

Biml Notes

Kevin Feasel

2016-08-11

Biml

Bill Fellows is going through Biml Hero training, and he has some notes from day 1:

Topological sorting

This was an in-depth Extension method but as with any good recursive algorithm it was precious few lines of code. Why I care about it is twofold: execution dependencies and as I type that, I realize lineage tracing would also fall under this, and foreign key traversal. For the former, in my world, I find I have the best success when my SSIS packages are tightly focused on a task and I use a master/parent package to handle the coordination and scheduling of sub-package execution. One could use an extension method to discover all the packages that implement an Execute Package Task and then figure out the ordering of dependent tasks. That could save me some documentation headaches.

Sounds like a fun training.

Understanding Bookmakers’ Odds Using R

Kevin Feasel

2016-08-11

R

Andrew Collier looks at odds, vigs, and other bookmaking concepts through the lens of the R programming language:

The house edge is 2.70%. On average a gambler would lose 2.7% of his stake per game. Of course, on any one game he would either win or lose, but this is the long term expectation. Another way of looking at this is to say that the Return To Player (RTP) is 97.3%, which means that on average a gambler would get back 97.3% of his stake on every game.

Below are the results of a simulation of 100 gamblers betting on even numbers. Each starts with an initial capital of 100. The red line represents the average for the cohort. After 1000 games two gamblers have lost all of their money. Of the remaining 98 players, only 24 have made money while the rest have lost some portion of their initial capital.

This is a very interesting article if you’re interested in basic statistics.  13-year-old Onion article of note.

Azure SQL Database Versus SQL Server

Kevin Feasel

2016-08-11

Cloud

Kenneth Fisher learns about differences between Microsoft’s Azure SQL Database and their on-premises (or IaaS) SQL Server:

T-SQL Differences in Azure SQL Database

I used to think this was the real difference between SQL Server and SQL Database. I was wrong. Really wrong. But it’s a good place to start. Now from what I can tell everything in Azure is a moving target. There are constant changes so it’s important to know where the documentation is. In this particular case here it is: Azure SQL Database Transact-SQL differences.

Check it out.  The differences are smaller than in the past, but I expect that there will always be some differences—particularly on the administration side—due to the nature of Azure SQL Database as a PaaS offering.

Power BI Calendar Visualization

Devin Knight continues his Power BI visualization series and looks at a custom calendar visual:

  • Allows you to visualize a data point on each date on the calendar.

    • The darker the color, the higher the value or density of values.
  • If you have multiple rows on the same date they are aggregated together

  • The Calendar Visualization can be used for cross filtering. Meaning you can select a square in the calendar and it will filter other visuals down to the date you picked.

This is an interesting visual.  It’s dense, but not difficult to understand.

Getting Pagination Wrong

Lukas Eder discusses common pagination issues:

If your data source is a SQL database, you might have implemented pagination by using LIMIT .. OFFSET, or OFFSET .. FETCH or some ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.

Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”).

He also has a good explanation of the seek method.

I will throw in one jab at Oracle (because hey, it’s been a while since I’ve lobbed a bomb at Oracle on this blog):  it’d really suck to have a system where I legally wasn’t allowed to distribute relevant performance comparison benchmarks.  Fortunately, I tend to work on better data stacks.

Extended Events Audit

Steve Jones creates an audit with Extended Events:

The third part of the invitation was to write this. I covered what I did, and some of what I learned. I’ll add a bit more here.

I certainly was clumsy working with XE, and despite working my way through the course, I realize I have a lot of learning to do in order to become more familiar with how to use XE. While I got a basic session going, depending on when I started it and what I was experimenting with, I sometimes found myself with events that never went away, such as a commit or rollback with no corresponding opening transaction.

This T-SQL Tuesday was a bit broader in scope, so it has been interesting watching people respond.

Trapping HTTP Error Codes In Power BI

Chris Webb shows how to handle specific HTTP error codes when using the Web.Contents() function in M:

This thread on the Power Query forum suggests it’s something to do with lazy evaluation, but I haven’t been able to determine the situations when it does work and when it doesn’t.

Instead, it is possible to handle specific HTTP error codes using the ManualStatusHandling option in Web.Contents()

I guess this beats not being able to handle errors at all, but it seems like a fairly fragile solution if you next want to start handling the entire 500 class of response codes.

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031