Flow charts are an important part of a clinical trial report. Making them can be a pain though. One good way to do it seems to be with the grid and Gmisc packages in R. X and Y coordinates can be designated based on the center of the boxes in normalized device coordinates (proportions of the device space – 0.5 is this middle) which saves a lot of messing around with corners of boxes and arrows.
A very basic flow chart, based very roughly on the CONSORT version, can be generated as follows…
Click through for sample code and a resulting image. H/T R-bloggers
If you see this painting you will find a profound of colours with a great equilibrium between different hues, the hardy usage of complementary colours and the ability expressed in the “chiaroscuro” technique. While I was looking at the painting I started, wondering how we moved from this wisdom to the ugly charts you can easily find within today’s corporate reports ( find a great sample on the WTF visualization website)
This is where
Palettercomes from: bring the Renaissance wisdom and beauty within the plots we produce every day.
PaletteRis a lean R package which lets you draw from any custom image an optimized palette of colours. The package extracts a custom number of representative colours from the image. Let’s try to apply it on the “Vergine con il Bambino, angeli e Santi” before looking into its functional specification.
It’s an interesting package. I’ll have to play around with it.
The coordinate system can be manipulated by adding one of ggplot’s different coordinate systems. When you are imagining a coordinate system, you are most likely thinking of a Cartesian one. The Cartesian coordinate system combines x and y dimension orthogonally and is ggplots default (coord_cartesian).
There also are several varaitions of the familiar Cartesian coordinate system in ggplot, namely coord_fixed, coord_flip and coord_trans. For all of them, the displayed section of the data can be specified by defining the maximal value depicted on the x (xlim =) and y (ylim =) axis. This allows to “zoom in” or “zoom out” of a plot. It is a great advantage, that all manipulations of the coordinate system only alter the depiction of the data but not the data itself.
I tend to avoid polar coordinates, but that’s mostly because I don’t work in a space which benefits from it.
This results in a row per ride and visualises pretty well in SSMS. If you are familiar with the geography of London you can make out the river Thames toward the centre of the image and Regents Park towards the top left:
This could be overlaid on a shape file of London or a map from another provider such as Google Maps or Mapbox.
However, when you try to load the dataset into Power BI, you find that Power BI does not natively support Geography data types. There is an idea you can vote on here to get them supported: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/12257955-support-sql-server-geometry-geography-data-types-i
Hit up that idea link if you want to see geography type support within Power BI.
Typical Amazon QuickSight workflow
When you create an analysis, the typical workflow is as follows:
Connect to a data source, and then create a new dataset or choose an existing dataset.
(Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
Create a new analysis.
Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
(Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
(Optional) Add more visuals to the analysis.
(Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
(Optional) Publish the analysis as a dashboard to share insights with other users.
It’s interesting to see how Amazon is trying to move this functionality from third-party tools (Power BI, Tableau, etc.) and notebooks right into the set of AWS offerings. Contrast this with the way that Microsoft is building in Jupyter with Azure Notebooks.
The reticulate package provides a comprehensive set of tools for interoperability between Python and R. The package includes facilities for:
- Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.
- Translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays).
- Flexible binding to different versions of Python including virtual environments and Conda environments.
Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability.
The more common use of
reticulate I’ve seen is running TensorFlow neural networks from R.
In the last map, it was a bit tricky to see the density of the incidents because all the graphed points were sitting on top of each other. In this scenario, we are going to make the data all one color and we are going to set the alpha variable which will make the dots transparent. This helps display the density of points plotted.
Also note, we can re-use the base map created in the first step “p” to plot the new map.
Check it out. This is an introduction to creating choropleths, making it a good start.
In my previous post, I have demonstrated how easy it is to create a bar graph in SQL Server 2017 In-Database Machine Learning using R.
We’re going to build upon that basic graph.
Sometimes doing data analysis would require us to look at an overview of our data across specific partitions, say a year. For example, we want to see how our product groups fare on month-to-month basis across the last 4 years.
In a data analytics perspective, there are quite a handful of data points in this requirement – data aggregate (quantity), monthly periods, and year partitions.
One of the approaches to handle such requirement is by using a
facet. Faceting is a way of plotting subsets of data into a matrix of panels based on one or more variables – or facets.
Click through for the example and code. Facets are quite useful, but they run the risk of misleading if you squeeze too many onto the screen. The same line can look quite different with a “tall” facet versus a “wide” facet, and that can change how people interpret your visual.
To build a Forest Plot often the
forestplotpackage is used in R. However, I find the
ggplot2to have more advantages in making Forest Plots, such as enable inclusion of several variables with many categories in a lattice form. You can also use any scale of your choice such as log scale etc. In this post, I will introduce how to plot Risk Ratios and their Confidence Intervals of several conditions.
Click through for the script. You might also want to compare it to the forestplot package to see how these differ.
Regularly I have reports that have an extensive amount of columns.
Because the amount of columns, reports tend to become inefficient and have too much information we don’t always need. The users may want to select certain columns and to make the report easier to read.
Hiding and showing columns in SSRS reports using parameters is a solution to make reports more dynamic and easier to use.
At the time of writing of this article, SQL Server Reporting Services did not yet have the possibility to use checkbox parameters. Instead we will be using a multi-value text parameter to show or hide our columns.
Click through to see how to do this.