Press "Enter" to skip to content

Category: R

Modifying HTML Rendering in Shiny

Senthil Thyagarajan gives us an example of uisng the htmltools package to change the way tables render in Shiny:

In order to build the html table I have used a function table_frame which can be used as a container in DT::renderdatatable. This function basically uses htmltools. For more references on the basics of html tables please refer here

In addition to changing the colors, Senthil also shows how to add a couple of buttons which call Javascript functions. H/T R-bloggers

Comments closed

Forensic Accounting: Cohort Analysis

I continue my series on forensic accounting techniques with cohort analysis:

In the last post, we focused on high-level aggregates to gain a basic understanding of our data. We saw some suspicious results but couldn’t say much more than “This looks weird” due to our level of aggregation. In this post, I want to dig into data at a lower level of detail. My working conception is the cohort, a broad-based comparison of data sliced by some business-relevant or analysis-relevant component.

Those familiar with Kimball-style data warehousing already understand where I’m going with this. In the basic analysis, we essentially look at fact data with a little bit of disaggregation, such as looking at data by year. In this analysis, we introduce dimensions (sort of) and slice our data by dimensions.

Click through for some fraud-finding fun.

Comments closed

Tidying Video Game Data

Arvid Kingl has a fun article analyzing data from an open-source video game and applying tidy data principles to it:

You will learn what key principles a tidy data set adheres to, why it is useful to follow them consequently, and how to clean the data you are given. Tidying is also a great way to get to know a new data set.

Finally, in this tutorial you will learn how to write a function that makes your analysis look much cleaner and allows you to execute repetitive elements in your analysis in a very reproducible way. The function will allow you to load the latest version of the data dynamically into a flexible data scheme, which means that large parts of the code will not have to change when new data is added.

Check it out. Bonus point: tidy data is Boyce-Codd Normal Form which is (potentially) subsequently widened back out to include dimensional information.

Comments closed

Basic Forensic Accounting Techniques

I continue my series on forensic accounting techniques:

Growth analysis focuses on changes in ratios over time. For example, you may plot annual revenue, cost, and net margin by year. Doing this gives you an idea of how the company is doing: if costs are flat but revenue increases, you can assume economies of scale or economies of scope are in play and that’s a great thing. If revenue is going up but costs are increasing faster, that’s not good for the company’s long-term outlook.

For our data set, I’m going to use the following SQL query to retrieve bus counts on the first day of each year. To make the problem easier, I add and remove buses on that day, so we don’t need to look at every day or perform complicated analyses.

I get into quite a bit in this post, including a quick tour of multicollinearity, which is only my second-favorite of the three linear regression amigos (heteroskedasticity being my favorite and autocorrelation the hanger-on).

Comments closed

Reviewing the Stack Overflow Developer Survey

Michael Toth looks at the recently-released 2019 Stack Overflow Developer Survey:

Since 2011, Stack Overflow has been surveying their users each year to answer questions about the technologies they use, their work experience, their compensation, and their satisfaction at work. Given Stack Overflow’s place in the broader programming world, they are able to draw quite the audience for their annual surveys.

This year, nearly 90,000 developers participated in the survey! There’s a lot in this survey, and I recommend reviewing it yourself, but I wanted to surface some of the key findings that I thought were particularly relevant to data professionals here.

Stack Overflow says they will be releasing the underlying data for this survey in the coming weeks, so I hope to return to this for a deeper analysis once that’s made available. For now, let’s get into the results!

Michael’s lede involves R versus Python in terms of salaries, but for me, the top line is that functional programmers make more money. Clojure, F#, Scala, Elixir, and Erlang make the top 10 on the global list, including positions 1, 2, 4, and 5. Within the US, Scala, Clojure, Erlang, Kotlin, F#, and Elixir make the top 10, including positions 1, 2, and 4. H/T R-Bloggers

Comments closed

Dates in Base R

Michael Toth explains some of the functionality available in base R (that is, not packages like lubridate) for working with dates:

When working with R date formats, you’re generally going to be trying to accomplish one of two different but related goals:

1. Converting a character string like “Jan 30 1989” to a Date type

2. Getting an R Date object to print in a specific format for a graph or other output

You may need to handle both of these goals in the same analysis, but it’s best to think of them as two separate exercises. Knowing which goal you are trying to accomplish is important because you will need to use different functions to accomplish each of these. Let’s tackle them one at a time.

There are some good insights in the post. H/T R-bloggers

Comments closed

Finding an Unfair Coin with R

Sebastian Sauer works out a coin flip problem:

A stochastic problem, with application to financial theory. Some say it goes back to Warren Buffett. I relied to my colleague Norman Markgraf, who pointed it out to me.

Assume there are two coins. One is fair, one is loaded. The loaded coin has a bias of 60-40. Now, the question is: How many coin flips do you need to be “sure enough” (say, 95%) that you found the loaded coin?

Let’s simulate la chose.

It took a few more flips than I had expected but the number is not outlandish.

Comments closed

Standard and Non-Standard Evaluation in R

John Mount explains Standard Evaluation versus Non-Standard Evaluation in R:

In standard (or value oriented evaluation) code you type in is taken to be variable names, functions, names, operators, and even numeric literal values. String values or literals need extra marks, such as quotes.

John walks us through several examples along the way. At the end, John is a major proponent of Standard Evaluation over Non-Standard Evaluation.

Comments closed