Press "Enter" to skip to content

Month: October 2018

Multi-Threaded R With Microsoft R Client

David Parr shows us how to get started with Microsoft R Client and performs some quick benchmarking:

This message will pop up, and it’s worth noting as it’s got some information in it that you might need to think about:

  • It’s worth noting that right now Microsoft r Client is lagging behind the current R version, and is based on version 3.4 of R, not 3.5. This will mean your default package libraries will not be shared between the installations if you are running R 3.5.

  • It’s using a snapshot of CRAN called MRAN to source packages by default. 90% of the time it will operate just as you expect, but because it takes a ‘snapshot’ of packages, newer features and changes that have hit CRAN may not be in the version of the package you are grabbing.

    • RevoScaleR and probably the ggplot2 and dplyr packages will likely be installed for you already as default in Microsoft R Client. The other two you will probably have to install yourself.
  • Intel MKL will have scanned your system on install and attempted to work out how many cores your processor has. Here it’s identified 2 on my old Lenovo Yoga. This is where the speed boost will come from.

I had an old two-core Lenovo Yoga too, so this article really spoke to me.

Comments closed

Using Biml To Read Excel Files Without Excel

Bill Fellows follows up on his prior post and shows how you can write BimlScript to parse an Excel file without having Microsoft Office installed:

My resources are quite simple: Excel Spreadsheet containing meta data, a driver program and a package template.

The template is your standard truncate and reload pattern with the target table being specified by a parameter. The client validates data by running processes in parallel so the existing mainframe process delivers data to the Billing table while ours delivers to a Billing_NEW table. Once they accept the new process, the target table becomes Billing and the NEW table is dropped. I decided the most native SSIS route would be use specify the target table in as a parameter. We originally have a boolean parameter indicating whether we were loading the new table or the production one but that was more logic and overhead that just specifying which table to load. I force their queries to be dirty reads as some of these queries can be rather messy.

Click through for the script and explanation.

Comments closed

Calculating Effective Rights For A SQL Server Principal

Louis Davidson gives us a procedure which provides effective rights for a principal:

In my blog Calculating a Security Principal’s Effective Rights. I built a view, named Utilty.EffectiveSecurity that you could query to fetch a security principal’s rights to objects in a database. In that blog I tested the code and showed how it works. Now I have taken this to the extreme and expanded the view to include all of the user’s security by finding all of their rights to all of the things that the get rights for.

The list of possible permissions you can fetch can be retrieved from:

This returns the following 26 types of things that can have permissions assigned and returned by the sys.fn_my_permissions function:

Read on for the code.

Comments closed

Uninstalling A SQL Server Feature

Marek Masko shows how to uninstall a particular SQL Server feature:

If you wonder if it is possible to uninstall particular SQL Server feature instead of uninstalling the complete instance, then I have a good news. It is possible and it is very easy.

Let’s assume you want to uninstall PolyBase because you want to install it with a newer version of SQL Server (PolyBase can be installed on only one SQL Server instance per computer).

It brings me mild sadness that the feature Marek chose for this demo was Polybase, but slightly greater joy for the reason.

Comments closed

Help Make dbatools Even Better

Patrick Flynn wants you to improve dbatools:

As part of dbatools participation in this event we are encouraging contributors to assist “the road towards 1.0” by improving the examples available in the comment-based help, which power the new docs site at docs.dbatools.io.

The activity is available to anyone who wants to help and does not require any expertise in PowerShell. Any of the following actions are desirable:

  •  Fix typos in examples
  •  Fix obvious errors in examples
  •  Add examples to illustrate use of all possible parameters
  •  Add examples to illustrate use of pipeline support
  •  Add examples to illustrate combining multiple dbatools commands.
  •  Add examples that illustrate use of dbatools commands in new or interesting ways.

We are a looking for a max of 6-8 examples per command.

Patrick also shows you how easy it is to edit the documentation, so check that out and get contributing.

Comments closed

Image Clustering With Keras And R

Shirin Glander shows us how to use R to extract learned features from Keras and cluster those features:

For each of these images, I am running the predict() function of Keras with the VGG16 model. Because I excluded the last layers of the model, this function will not actually return any class predictions as it would normally do; instead we will get the output of the last layer: block5_pool (MaxPooling2D).

These, we can use as learned features (or abstractions) of the images. Running this part of the code takes several minutes, so I save the output to a RData file (because I samples randomly, the classes you see below might not be the same as in the sample_fruits list above).

Read the whole thing.

Comments closed

Using plm To Analyze Panel Data

Michael Grogan shows us how to use the plm package to perform linear regression against panel data:

Types of data

  • Cross-Sectional: Data collected at one particular point in time
  • Time Series: Data collected across several time periods
  • Panel Data: A mixture of both cross-sectional and time series data, i.e. collected at a particular point in time and across several time periods
  • Fixed Effects: Effects that are independent of random disturbances, e.g. observations independent of time.
  • Random Effects: Effects that include random disturbances.

Let us see how we can use the plm library in R to account for fixed and random effects. There is a video tutorial link at the end of the post.

Read on for an example.

Comments closed

SSMS On A Diet

Brent Ozar is happy that SQL Server Management Studio has dropped a few pounds:

SSMS 17.9 on the left has Database Diagrams at the top.

SSMS 18.0 does not. Database Diagrams are simply gone. Hallelujah! For over a decade, people have repeatedly cursed SSMS as they’ve accidentally clicked on the very top item and tried to expand it. One of the least-used SSMS features had one of the top billings, and generated more swear words than database diagrams.

The good news continues when you right-click on a server, click Properties, and click Processors.

The comments show that not everyone is happy about this, but I do think it’s for the best—the database diagram tool hadn’t been updated in a long time and is missing many features that an ER tool needs.  I’d rather use Visio (or a better tool).

Comments closed

Getting A Specific Rank In DAX

Marco Russo shows us how to get the Nth element in a list using DAX:

The complexity of the calculation is in the Nth-Product Name Single and Nth-Product Sales Amount Single measures. These two measures are identical. The only difference is the RETURN statement in the last line, which chooses the return value between the NthProduct and NthAmount variables.

Unfortunately, DAX does not offer a universal way to share the code generating tables between different measures. Analysis Services Tabular provides access to DETAILROWS as a workaround, but this feature cannot be defined in a Power BI or Power Pivot data model as of now.

Indeed, the code of the two measures is nearly identical.

Read on for code and explanation.

Comments closed

When Table Variables Have Realistic Estimates, Unrealistic Results May Occur

Milos Radivojevic wraps up a series on deferred compilation for table variables by looking at a hack which used to work but no longer does:

With this change, the query is executed very fast, with the appropriate execution plan:

SQL Server Execution Times:
CPU time = 31 ms,  elapsed time = 197 ms.

However, the LOOP hint does not affect estimations and the optimizer decisions related to them; it just replaces join operators chosen by the optimizer by Nested Loop Joins specified in the hint. SQL Server still expects billions of rows, and therefore the query got more than 2 GB memory grant for sorting data, although only 3.222 rows need to be sorted. The hint helped optimizer to produce a good execution plan (which is great; otherwise this query would take very long and probably will not be finished at all), but high memory grant issue is not solved.

As you might guess, now it’s time for table variables.

This is an interesting article with workarounds and counter-workarounds to solve a nasty estimation problem.

Comments closed