2018-10-09 – Curated SQL

For each of these images, I am running the predict() function of Keras with the VGG16 model. Because I excluded the last layers of the model, this function will not actually return any class predictions as it would normally do; instead we will get the output of the last layer: block5_pool (MaxPooling2D).

These, we can use as learned features (or abstractions) of the images. Running this part of the code takes several minutes, so I save the output to a RData file (because I samples randomly, the classes you see below might not be the same as in the sample_fruits list above).

Read the whole thing.

Comments closed

Using plm To Analyze Panel Data

Published 2018-10-09 by Kevin Feasel

Michael Grogan shows us how to use the plm package to perform linear regression against panel data:

Types of data

Cross-Sectional: Data collected at one particular point in time

Time Series: Data collected across several time periods

Panel Data: A mixture of both cross-sectional and time series data, i.e. collected at a particular point in time and across several time periods

Fixed Effects: Effects that are independent of random disturbances, e.g. observations independent of time.

Random Effects: Effects that include random disturbances.

Let us see how we can use the plm library in R to account for fixed and random effects. There is a video tutorial link at the end of the post.

Read on for an example.

Comments closed

SSMS On A Diet

Published 2018-10-09 by Kevin Feasel

Brent Ozar is happy that SQL Server Management Studio has dropped a few pounds:

SSMS 17.9 on the left has Database Diagrams at the top.

SSMS 18.0 does not. Database Diagrams are simply gone. Hallelujah! For over a decade, people have repeatedly cursed SSMS as they’ve accidentally clicked on the very top item and tried to expand it. One of the least-used SSMS features had one of the top billings, and generated more swear words than database diagrams.

The good news continues when you right-click on a server, click Properties, and click Processors.

The comments show that not everyone is happy about this, but I do think it’s for the best—the database diagram tool hadn’t been updated in a long time and is missing many features that an ER tool needs. I’d rather use Visio (or a better tool).

Comments closed

Getting A Specific Rank In DAX

Published 2018-10-09 by Kevin Feasel

Marco Russo shows us how to get the Nth element in a list using DAX:

The complexity of the calculation is in the Nth-Product Name Single and Nth-Product Sales Amount Single measures. These two measures are identical. The only difference is the RETURN statement in the last line, which chooses the return value between the NthProduct and NthAmount variables.

Unfortunately, DAX does not offer a universal way to share the code generating tables between different measures. Analysis Services Tabular provides access to DETAILROWS as a workaround, but this feature cannot be defined in a Power BI or Power Pivot data model as of now.

Indeed, the code of the two measures is nearly identical.

Read on for code and explanation.

Comments closed

When Table Variables Have Realistic Estimates, Unrealistic Results May Occur

Published 2018-10-09 by Kevin Feasel

Milos Radivojevic wraps up a series on deferred compilation for table variables by looking at a hack which used to work but no longer does:

With this change, the query is executed very fast, with the appropriate execution plan:

SQL Server Execution Times: CPU time = 31 ms, elapsed time = 197 ms.

However, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.

As you might guess, now it’s time for table variables.

This is an interesting article with workarounds and counter-workarounds to solve a nasty estimation problem.

Comments closed

Reading Excel Files In An Office-less World

Published 2018-10-09 by Kevin Feasel

Bill Fellows shows us how to read from an Excel file on a machine without Microsoft Office installed:

A common problem working with Excel data is Excel itself. Working with it programatically requires an installation of Office, and the resulting license cost, and once everything is set, you’re still working with COM objects which present its own set of challenges. If only there was a better way.

Enter, the better way – EPPlus. This is an open source library that wraps the OpenXml library which allows you to simply reference a DLL. No more installation hassles, no more licensing (LGPL) expense, just a simple reference you can package with your solutions.

Let’s look at an example.

Read on for the example. A couple alternatives I like are readxl and XLConnect in R.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Day: October 9, 2018

Image Clustering With Keras And R

Using plm To Analyze Panel Data

Types of data

SSMS On A Diet

Getting A Specific Rank In DAX

When Table Variables Have Realistic Estimates, Unrealistic Results May Occur

Reading Excel Files In An Office-less World