John Mount has some good advice for R users running Spark queries:
For some time we have been teaching
Rusers “when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis.”The idea behind the advice is: working with fewer columns makes for quicker queries.
The issue arises because wide tables (200 to 1000 columns) are quite common in big-data analytics projects. Often these are “denormalized marts” that are used to drive many different projects. For any one project only a small subset of the columns may be relevant in a calculation.
Some wonder is this really an issue or is it something one can ignore in the hope the downstream query optimizer fixes the problem. In this note we will show the effect is real.
This is good advice for more than just dealing with R on Spark.
Comments closed
Dashboard sharing have very few options to set and is very simple to configure. You just need to add the email address of people whom you want to share this report. You can also write a message for them to know that this report is shared with them.