Brian Donovan and Dan Work from the University of Illinois has pointed out that this dataset “contains a large number of errors. For example, there are several trips where the reported meter distances are significantly shorter than the straight-line distance, violating Euclidean geometry“. So, that triggered my interest to add an additional column to this dataset with a straight line distance between two geo-points of pickup and dropoff locations, and that’s where I wanted Wrangling Data Flows to help me.
Read on for Rayis’s demonstration, as well as a long list of observations (positive and negative) about the current state of Wrangling data flows.