As someone very interested in storytelling, ggplot2 is easily my data visualization tool of choice. It is like the Swiss army knife for data visualization. One of my favorite features is the ability to pack a graph chock-full of dimensions. This ability is incredibly handy during the data exploration phases. However, sometimes I find myself wanting to look at trends without all the noise. Specifically, I often want to look at very dense scatterplots for outliers. Ggplot2 is great at this, but when we’ve isolated the points we want to understand, we can’t easily examine all possible dimensions right in the static charts.
Enter plotly. The plotly package and ggploty function do an excellent job at taking our high quality ggplot2 graphs and making them interactive.
Read on for several quality, interactive visuals.
If you want to work in the above way we suggest giving our
cdatapackage a try. We named the functions
unpivot_to_blocks. The idea was: by emphasizing the record structure one might eventually internalize what the transforms are doing. On the way to that we have a lot of documentation and tutorials.
This is your regular reminder that the Tidyverse is very useful, but it is not the entirety of R.
If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.
Low dependencies and low complexity dependencies can also be wrong, but in this case there at least exists the possibility of checking things or running down and fixing issues.
There are some insightful comments on this post as well, so check those out. This is definitely an area where there are trade-offs, so trying to reason through when to move in which direction is important.
ggplot– You can spot one from a mile away, which is great! And when you do it’s a silent fist bump. But sometimes you want more than the standard theme.
Fonts can breathe new life into your plots, helping to match the theme of your presentation, poster or report. This is always a second thought for me and
needto work out how to do it again, hence the post .
Read on to see how to use each of these packages. H/T R-bloggers
There seems to be a general (false) impression among non R-core developers that to run tests,
Rpackage developers need a test management system such as
testthat. And a further false impression that
testthatis the only
Rtest management system. This is in fact not true, as
Ritself has a capable testing facility in “
R CMD check” (a command triggering
Rchecks from outside of any given integrated development environment).
By a combination of skimming the
R-manuals ( https://cran.r-project.org/manuals.html ) and running a few experiments I came up with a description of how
R-testing actually works. And I have adapted the available tools to fit my current preferred workflow. This may not be your preferred workflow, but I have and give my reasons below.
Food for thought for any R developer.
The R Core Team announced yesterday the release of R 3.5.3, and updated binaries for Windows and Linux are now available (with Mac sure to follow soon). This update fixes three minor bugs (to the functions
stopifnot), but you might want to upgrade just to avoid the “package built under R 3.5.4” warnings you might get for new CRAN packages in the future.
Click through for more info on this release, including where the name from each R release comes from.
Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is serves as a proxy variable for “station busyness”.
When working with spatial data cases, it’s important to differentiate an effect you see because it’s actually unique or interesting versus an effect you see because that’s where all of the people are.
You can easily see how arbitrary the shapes can be almost magically discovered, through the principle of the nearest neighbor search.
The magic happens because the methodical approach of meeting and greeting the neighbors discovers more and more neighbors (and hence the visualization becomes denser and denser) as per the formation of the shape, and on the other hand, sparser and sparser as the traversal approaches the contours of those very shapes. The sparseness around the dense shapes provides the much-needed contrast to discover hidden shapes.
Read on for a very interesting explanation.
If there is one thing of general utility lacking in ggplot2 it is probably the ability to annotate data cleanly. Sure, there’s
geom_label()but using them requires a fair bit of fiddling to get the best placement and further, they are mainly relevant for labeling and not longer text.
ggrepelhas improved immensely on the fiddling part, but the lack of support for longer text annotation as well as annotating whole areas is still an issue.
In order to at least partly address this, ggforce includes a family of geoms under the
geom_mark_*()moniker. They all behaves equivalently except for how they encircle the given area(s).
There are some really interesting features in the
ggforce package, so check them out.
The reports follow a common template where the major difference is simply the hashtag. So one way to create these reports is to use the previous one, edit to find/replace the old hashtag with the new one, and save a new file.
That works…but what if we could define the hashtag once, then reuse it programmatically anywhere in the document? Enter Rmarkdown parameters.
The example is small but important.