Docker is designed to enclose environments inside an image / a container. What this allows, for example, is to have a Linux machine on a Macbook, or a machine with R 3.3 when your main computer has R 3.5. Also, this means that you can use older versions of a package for a specific task, while still keeping the package on your machine up-to-date.
This way, you can “solve” dependencies issues: if ever you are afraid dependencies will break your analysis when packages are updated, build a container that will always have the software versions you desire: be it Linux, R, or any package.
Click through for the details. H/T R-bloggers
This sort of difference, scalar oriented
C++being so much faster than scalar oriented
R, is often distorted into “
This is just not the case. If we adapt the algorithm to be vectorized we get an
Ralgorithm with performance comparable to the
Not all algorithms can be vectorized, but this one can, and in an incredibly simple way. The original algorithm itself (
xlin_fits_R()) is a bit complicated, but the vectorized version (
xlin_fits_V()) is literally derived from the earlier one by crossing out the indices. That is: in this case we can move from working over very many scalars (slow in
R) to working over a small number of vectors (fast in
This is akin to writing set-based SQL instead of cursor-based SQL: you’re thinking in terms which make it easier for the interpreter (or optimizer, in the case of a database engine) to operate quickly over your inputs. It’s also one of a few reasons why I think learning R makes a lot of sense when you have a SQL background.
While this commit was done in the autumn 2017, nothing further happened until I decided to make gganimate the center of my useR 2018 keynote, at which point I was forced (by myself) to have some sort of package ready by the summer of 2018.
A fair amount of users have shown displeasure in the breaking changes this history has resulted in. Many blog posts have already been written focusing on the old API, as well as code on numerous computers that will no longer work. I understand this frustration, of course, but both me and David agreed that doing it this way was for the best in the end. I’m positive that the new API has already greatly exceeded the mind-share of the old API and given a year the old API will be all but a distant memory…
Read on for information on these breaking changes, and how the changes will make life easier in the long run. And stay for the fireworks. H/T R-Bloggers
Last month, I delivered the one-day workshop Practical AI for the Working Software Engineer at the Artificial Intelligence Live conference in Orlando. As the title suggests, the workshop was aimed at developers, bu I didn’t assume any particular programming language background. In addition to the lecture slides, the workshop was delivered as a series of Jupyter notebooks. I ran them using Azure Notebooks (which meant the participants had nothing to install and very little to set up), but you can run them in any Jupyter environment you like, as long as it has access to R and Python. You can download the notebooks and slides from this Github repository (and feedback is welcome there, too).
Read on for details about those notebooks and to get your own copies.
In the above we have an input (or independent variable)
xand an observed outcome (or dependent variable)
y_observed(portrayed as points).
y_observedis the unobserved idea value
y_ideal(portrayed by the dashed curve) plus independent noise. The modeling goal is to get close the
y_idealcurve using the
y_observedobservations. Obviously this can be done with a smoothing spline, but let’s use
RcppDynProgto find a piecewise linear fit.
To encode this as a dynamic programming problem we need to build a cost matrix that for every consecutive interval of
x-values we have estimated the out-of sample quality of fit. This is supplied by the function
RcppDynProg::lin_costs()(using the PRESS statistic), but lets take a quick look at the idea.
It’s an interesting package whose purpose is to turn an input data stream into a set of linear functions which approximate the stream. I’m not sure I’ll ever have a chance to use it, but it’s good to know that it’s there if I do ever need it.
2 Weeks ago I was working in a very interesting case in SQL 2016. I received an email from one of my customers saying that they were having intermitent issues within their app that was executing some SP_Execute_External_script SP calls to the database.
We also restarted the launchpad service but with no luck….. The biggest challenge was that sometimes the service responded fine and sometimes it showed the issue that I paste above (And this for me was absolutelly new…). From SQL Side we executed an extended events session with all R services counters but nothing appeared… From sys.dm_Exec_session_Wait_stats we just observed that the session was waiting for SATELLITE_SERVICE_SETUP wait which points that SQL Was waiting an answer from the R service itself.
Click through for the solution.
This post came about due to a question on the Microsoft Machine Learning Server forum. The question was if there are any plans by Microsoft to support more the one input dataset (
sp_execute_external_script. My immediate reaction was that if you want more than one dataset, you can always connect from the script back into the database, and retrieve data.
However, the poster was well aware of that, but due to certain reasons he did not want to do it that way – he wanted to push in the data, fair enough. When I read this, I seemed to remember something from a while ago, where, instead of retrieving data from inside the script, they pushed in the data, serialized it as an output parameter and then used the binary representation as in input parameter (yeah – this sounds confusing, but bear with me). I did some research (read Googling), and found this StackOverflow question, and answer. So for future questions, and for me to remember, I decided to write a blog post about it.
This has been a point of frustration for me. We can name the one input data set, so I’d really like to see true support for input multiple data sets without the need for hacks.
I’ve been looking for an easy way to get to learning predictive analysis and forecasting. Prophet provides that path. Prophet is released by Facebook’s Core Data Science Team.
“Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.”
Just to dip my toes into the waters, I tried Prophet’s Quick Start Guide in R.
Let’s forecast the Field Goal Percentage (FG%) of Kyle Kuzma of the Los Angeles Lakers for the next 6 Months.
It’d be critical and important if it were hockey data. Or football data or baseball data or maybe even cricket data (but I don’t understand cricket data and why is that guy still running didn’t he get thrown out or something I don’t get it?).
As far as Prophet goes, it’s a useful library and works well if you’re looking at seasonal time series data.
The part that causes the biggest difficulties (especially for beginners of R) is that you state the name of the function at the beginning and use the assignment operator – as if functions were like any other data type, like vectors, matrices or data frames…
Congratulations! You just encountered one of the big ideas of functional programming: functions are indeed like any other data type, they are not special – or in programming lingo, functions are first-class members.
This is one of the core tenets of functional programming: functions are things you can pass around to other functions; they aren’t special, inviolate pieces of code but are just another thing. Click through for a couple good examples of what you get in a language which supports higher-order functions.
R 3.5.2, the latest version of the R language for statistical computation and graphics from the R Foundation, was released today. (This release is codenamed “Eggshell Igloo”, likely in reference to this or this Peanuts cartoon.)
Click through for more detail. I’m probably just going to wait for R 4.0, myself.