Press "Enter" to skip to content

Day: February 8, 2017

Looping In R

Klodian Dhana explains how to build a for loop in R:

I used linear mixed effect model and therefore I loaded the lme4 library. The loop should work with other regression analysis (i.e. linear regression), if you modify it according to your regression model. If you don’t know which part to modify, leave a comment below and I will try to help.

As other loops, this call variables of interest one by one and for each of them extract and store the betas, standard error and p value. Remember, this code is specific for linear mixed effect models.

Read the whole thing.  It’s good to keep in mind, though, that set-based R operations tend to perform best, so save looping for cases in which you can’t build a set-based function.

Comments closed

Table Variables And TF2453

Tara Kizer investigates Trace Flag 2453:

I recently saw a server with trace flag 2453 configured. I hadn’t come across this trace flag before, so I did a little research. Microsoft says it allows “a table variable to trigger recompile when enough number of rows are changed”. This can lead to a more efficient execution plan. Trace flag 2453 is available in SP2 or greater for SQL Server 2012, CU3 or greater for SQL Server 2014 and RTM or greater for SQL Server 2016.

I was curious how a query using a table variable performed as compared to the “same” query using:

  • trace flag 2453

  • OPTION (RECOMPILE)

  • a temporary table

Click through for a relative performance comparison.

Comments closed

Mapping Geospatial Data

The folks at Sharp Sight Labs have a great blog post on mapping geospatial data using R:

If you’ve learned the basics of data visualization in R (namely, ggplot2) and you’re interested in geospatial visualization, use this as a small, narrowly-defined exercize to practice some intermediate skills.

There are at least three things that you can learn and practice with this visualization:

  1. Learn about color: Part of what makes this visualization compelling are the colors. Notice that in the area surrounding the US, we’re not using pure black, but a dark grey. For the title, we’re not using white, but a medium grey. Also, notice that for the rivers, we’re not using “blue” but a very specific hexadecimal color. These are all deliberate choices. As an exercise, I highly recommend modifying the colors. Play around a bit and see how changing the colors changes the “feel” of the visualization.

  2. Learn to build visualizations in layers: I’ve emphasized this several times recently, but layering is an important principle of data visualization. Notice that we’re layering the river data over the USA country map. As an exercise, you could also layer in the state boundaries between the country map and the rivers. To do this, you can use map_data().

  3. Learn about ‘Spatial’ data: R has several classes for dealing with ‘geospatial’ data, such as ‘SpatialLines‘, ‘SpatialPoints‘, and others. Spatial data is a whole different animal, so you’ll have to learn its structure. This example will give you a little experience dealing with it.

I also like the iterative approach they discuss.  You’ll almost never get it right the first go-around, but one of the nice things about ggplot2 is that it’s designed to be iterative:  you layer your changes on, making it a bit easier to fiddle with them to get the visualization just right.

Comments closed

Building A SQL Server Dockerfile

Andrew Pruski builds up a custom dockerfile with his SQL Server configuration and custom databases:

And there you have it. One newly built SQL container from a custom image running our databases.

Imagine being able to spin up new instances of SQL with a full set of databases ready to go in minutes. This is main advantage that container technology gives you, no more waiting to install SQL and then restore databases. Your dev or QA person can simply run one script and off they go.

I really think this could be of significant benefit to many companies and we’re only just starting to explore what this can offer.

The best part is that it’s quite easy to do, meaning you could set up a guaranteed clean QA image for each continuous integration deployment and know that some process oddity didn’t fail to clean up after itself and thereby wrecked the automated build process.

Comments closed

Threading And Learning

Jay Robinson has a two-pronged tale:

This reminds me of an old saying: If you’re the smartest person in the room, then you’re in the wrong room.*

Now, this is not a commentary on my current team. I work with some really smart people, and I’m very grateful for that. But while my teammate may be one of the best PHP or Node.js coders I know, that doesn’t necessarily translate to an expertise with the .NET Framework. The true test is this – no matter how smart they are, if they’re not catching my mistakes, then I’m not being held accountable.

There is some good advice here on threading (yes, definitely use the newer threading libraries), but also good advice on surrounding yourself with intelligent people who can catch your mistakes.

Comments closed

Articles On R And SQL Server

Tomas Kastrun links to a series of his published articles on working with SQL Server and R:

In past couple of months, I have prepared several articles on R and SQL Server that have been published on SQL Server Central.

The idea was, to have couple of articles covering the introduction to R, to basics on R Server, to some practical cases on R with SQL Server.

There’s a nice flow here, building up from the basics to practical marginal improvements.

Comments closed

DBCC Checks For Large Databases

David Alcock gives a couple methods for performing consistency checks against gigantic databases:

One way to achieve this is to  split up the consistency checks covering smaller objects and native functionality allows us to do just that, we can perform the checks at the table level or indeed if they are implemented at the filegroup level too using the DBCC CHECKFILEGROUP command.

How to go about this is pretty straightforward; take the list of tables, split them into equal(ish) groups. The groups now form a pool of objects and within a nightly (or daily) window perform the check on each object in the pool. This effectively spreads a database consistency check over multiple days, you avoid the impact on production activities but also ensure all objects are checked over time.

Read on for the solution.  I’m also a big fan of Minion CheckDB, which is designed to handle this type of scenario as well.

Comments closed

Custom Power BI Shapes Using R

Koen Verbeeck uses R to create dynamically changing images in Power BI:

You can insert images into Power BI Desktop, but these are static images. If you want them to dynamically change, you need the Image Viewer custom visual. Unfortunately, it doesn’t support measures, only columns. Since we want dynamic changes, fixed column values are not going to work. Someone proposed a work around on the Power BI forums, but this only works if you have a fixed set of attributes you want to slice on (for example, 4 categories). I want a totally flexible solution (e.g. each month we have a couple of new weeks to slice on), so again, not possible.

The only solution I could think of that would still work: using R visuals.

Read on for the solution.

Comments closed