R – Page 68 – Curated SQL

R 3.6.3 Now Available

Published 2020-03-11 by Kevin Feasel

On February 29, R 3.6.3 was released and is now available for Windows, Linux and Mac systems. This update, codenamed “Holding the Windsock“, fixes a few minor bugs, and as a minor update maintains compatibility with scripts and packages written for prior versions of R 3.6.
February 29 is an auspicious date, because that was the day that R 1.0.0 was released to the world: February 29, 2000. In the video below from the CelebRation2020 conference marking the 20th anniversary of R, core member Peter Dalgaard reflects on the origins of R, and releases R 3.6.3 live on stage (at the 33-minute mark).

I’m holding out for R 4, though then I’ll have to wait to see when SQL Server will officially support it.

Comments closed

Developing Shiny Apps in Databricks

Published 2020-03-11 by Kevin Feasel

Yifan Cao, Hossein Falaki, and Cyirelle Simeone announce something cool:

We are excited to announce that you can now develop and test Shiny applications in Databricks! Inside the RStudio Server hosted on Databricks clusters, you can now import the Shiny package and interactively develop Shiny applications. Once completed, you can publish the Shiny application to an external hosting service, while continuing to leverage Databricks to access data securely and at scale.

That’s really cool. Databricks dashboards are nice for simple stuff, but when you really need visualization power, having Shiny available is great.

Comments closed

Loading Data from CSVs with Inconsistent Quoted Identifiers

Published 2020-03-10 by Kevin Feasel

Dave Mason has some fun with loading data from files:

BCP and OPENROWSET are long-lived SQL Server options for working with data in external files. I’ve blogged about OPENROWSET, including a recent article showing a way to deal with quoted data. One of the shortcomings I’ve never been able to overcome is an inconsistent data file with data fields in some rows enclosed in double quotes, but not all. I’ve never found a way around this limitation.
Let’s demonstrate with BCP. Below is a sample data file I’ll attempt to load into a SQL Server table. Note the data fields highlighted in yellow, which are enclosed in double quotes and contain the field terminator , (comma) character. For reference, the file is also available on Github.

I get unduly frustrated with the implementations of various data loaders around SQL Server and how they handle quoted identifiers differently. And don’t get me started on PolyBase.

Comments closed

Pasting an R Plot into Word

Published 2020-03-03 by Kevin Feasel

Eran Raviv takes us through converting a plot in R to work with Microsoft Word:

In this post you will learn how to properly paste an R plot\chart\image to a word file. There are few typical problems that occur when people try to do that. Below you can find a simple, clean and repeatable solution.
When you google how to paste a plot from R to a word file you find that there are some solutions. But they are not satisfactory. For example, stackoverflow highest-ranking reply offers to use the Rstudio button to export your plot as an Enhanced Metafile (EMF) format. Couple of things wrong with it: the first is that you need to start messing around with the device scaling, because the export remembers the port dimensions. The second is that the word file is often not the final version. For better readability\representation we often convert the word to a pdf format before sending\publishing. But then you get something funny which you may have seen before, and ~~drove some people insane~~ consumed much of some people’s time, myself included:

Also check out the linked blog post for additional insights into why this happened.

Comments closed

Converting Odds to Probabilities with R

Published 2020-03-02 by Kevin Feasel

Jonas Christoffer Lindstrom has a new package:

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Read on to learn more about the problem and a few solutions. H/T R-Bloggers.

Comments closed

Pivoting Data in R

Published 2020-02-26 by Kevin Feasel

Dave Mason shows how you can pivot SQL Server data using Machine Learning Services and R:

Pivoting data in SQL Server is something that comes up infrequently for me. Whenever the need arises, I have to pause and ask myself “What is it I’m trying to do again?”. Next I go to the documentation for the T-SQL PIVOT syntax (which I’ll never commit to memory) and realize for the umpteenth time the pivoted values have to be hard coded. Then I ponder using dynamic T-SQL because I won’t always know the values to pivot at query design time.
If T-SQL isn’t a good hammer to PIVOT’s nail, perhaps R is. There are different packages that make summarizing and transposing data frames somewhat easy. It can even dynamically pivot unknown values at runtime. But there is a catch, which I’ll get to in a bit.

This excerpt ends on a cliffhanger, so you’ll have to read Dave’s post to learn about the catch.

Comments closed

20 Years of R

Published 2020-02-24 by Kevin Feasel

Jozef Hajnala has some fun looking at the growth in R over the past 20 years:

It is almost the 29th of February 2020! A day that is very interesting for R, because it marks 20 years from the release of R v1.0.0, the first official public release of the R programming language.

Click through to see how much faster R has become, as well as the ecosystem changes during that time. H/T R-Bloggers

Comments closed

Computing a Z Score with R

Published 2020-02-18 by Kevin Feasel

Anisa Dhana shows us a quick example of how to calculate Z score with R:

In short, the z-score is a measure that shows how much away (below or above) of the mean is a specific value (individual) in a given dataset. In the example below, I am going to measure the z value of body mass index (BMI) in a dataset from NHANES.

Because R is a set-oriented, functional programming language, the answer is quite simple.

Comments closed

Matrix Inputs for Shiny Apps

Published 2020-02-18 by Kevin Feasel

Andreas Neudecker shows off a new package:

We have been developing shiny apps for quite some years now. A problem we stumbled upon multiple times in this process was, that there is no easy approach to define matrices in shiny. So we had to help ourselves with workarounds.

Not anymore. Now you can plop a matrix right onto your Shiny app. And the package is on CRAN.

Comments closed

Publishable Adverse Event Tables in R

Published 2020-02-13 by Kevin Feasel

Inge Christoffer Olsen shows how to clean up tables in R for publication:

The summary of Adverse Events is a nice table just summing up the adverse events in the trial. Note the “[N] n (%)”-format which is the number of events, number of patients with events and percentage of patients with event.

This particular example is about adverse events, but the key concepts in the code apply to many kinds of tables you want to make look a bit nicer. H/T R-Bloggers

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R