Press "Enter" to skip to content

Category: R

Correlation in easystats

The easystats team announces a new R package:

The easystats project continues to grow with its more recent addition, a package devoted to correlations. Check-out its webpage here!

It’s lightweight, easy to use, and allows for the computation of many different kinds of correlations, such as partial correlations, Bayesian correlations, multilevel correlations, polychoric correlations, biweightpercentage bend or Sheperd’s Pi correlations (types of robust correlation), distance correlation (a type of non-linear correlation) and more, also allowing for combinations between them (for instance, Bayesian partial multilevel correlation).

I’d recommend reading the examples on the GitHub repo due to formatting. Looks quite interesting. H/T R-Bloggers.

Comments closed

Using Pre-Trained Sentiment Models with Power BI

Ryan Wade shows us how to use a pre-built sentiment analysis model with Power BI:

As of this writing, there are two pre-trained models available: one for sentiment analysis and another for image classification. This example focuses on sentiment analysis.

Both of these installations are freely available to the on-prem version of SQL Server 2017 and later. For more information on how to install these on your instance, reference this article for SQL Server Machine Learning Services and this article for pre-trained models.

Click through for step-by-step instructions.

Comments closed

R 3.6.3 Now Available

David Smith takes a look at R 3.6.3:

On February 29, R 3.6.3 was released and is now available for Windows, Linux and Mac systems. This update, codenamed “Holding the Windsock“, fixes a few minor bugs, and as a minor update maintains compatibility with scripts and packages written for prior versions of R 3.6. 

February 29 is an auspicious date, because that was the day that R 1.0.0 was released to the world: February 29, 2000. In the video below from the CelebRation2020 conference marking the 20th anniversary of R, core member Peter Dalgaard reflects on the origins of R, and releases R 3.6.3 live on stage (at the 33-minute mark).

I’m holding out for R 4, though then I’ll have to wait to see when SQL Server will officially support it.

Comments closed

Developing Shiny Apps in Databricks

Yifan Cao, Hossein Falaki, and Cyirelle Simeone announce something cool:

We are excited to announce that you can now develop and test Shiny applications in Databricks! Inside the RStudio Server hosted on Databricks clusters, you can now import the Shiny package and interactively develop Shiny applications. Once completed, you can publish the Shiny application to an external hosting service, while continuing to leverage Databricks to access data securely and at scale.

That’s really cool. Databricks dashboards are nice for simple stuff, but when you really need visualization power, having Shiny available is great.

Comments closed

Loading Data from CSVs with Inconsistent Quoted Identifiers

Dave Mason has some fun with loading data from files:

BCP and OPENROWSET are long-lived SQL Server options for working with data in external files. I’ve blogged about OPENROWSET, including a recent article showing a way to deal with quoted data. One of the shortcomings I’ve never been able to overcome is an inconsistent data file with data fields in some rows enclosed in double quotes, but not all. I’ve never found a way around this limitation.

Let’s demonstrate with BCP. Below is a sample data file I’ll attempt to load into a SQL Server table. Note the data fields highlighted in yellow, which are enclosed in double quotes and contain the field terminator , (comma) character. For reference, the file is also available on Github.

I get unduly frustrated with the implementations of various data loaders around SQL Server and how they handle quoted identifiers differently. And don’t get me started on PolyBase.

Comments closed

Pasting an R Plot into Word

Eran Raviv takes us through converting a plot in R to work with Microsoft Word:

In this post you will learn how to properly paste an R plot\chart\image to a word file. There are few typical problems that occur when people try to do that. Below you can find a simple, clean and repeatable solution.
When you google how to paste a plot from R to a word file you find that there are some solutions. But they are not satisfactory. For example, stackoverflow highest-ranking reply offers to use the Rstudio button to export your plot as an Enhanced Metafile (EMF) format. Couple of things wrong with it: the first is that you need to start messing around with the device scaling, because the export remembers the port dimensions. The second is that the word file is often not the final version. For better readability\representation we often convert the word to a pdf format before sending\publishing. But then you get something funny which you may have seen before, and drove some people insane consumed much of some people’s time, myself included:

Also check out the linked blog post for additional insights into why this happened.

Comments closed

Converting Odds to Probabilities with R

Jonas Christoffer Lindstrom has a new package:

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Read on to learn more about the problem and a few solutions. H/T R-Bloggers.

Comments closed

Pivoting Data in R

Dave Mason shows how you can pivot SQL Server data using Machine Learning Services and R:

Pivoting data in SQL Server is something that comes up infrequently for me. Whenever the need arises, I have to pause and ask myself “What is it I’m trying to do again?”. Next I go to the documentation for the T-SQL PIVOT syntax (which I’ll never commit to memory) and realize for the umpteenth time the pivoted values have to be hard coded. Then I ponder using dynamic T-SQL because I won’t always know the values to pivot at query design time.

If T-SQL isn’t a good hammer to PIVOT’s nail, perhaps R is. There are different packages that make summarizing and transposing data frames somewhat easy. It can even dynamically pivot unknown values at runtime. But there is a catch, which I’ll get to in a bit.

This excerpt ends on a cliffhanger, so you’ll have to read Dave’s post to learn about the catch.

Comments closed