Leo at Locke Data looks at a couple Python packages which implement Tidyverse concepts:
The Dplython README provides some clear examples of how the package can be used. Below is an summary of the common functions:
-
select() – used to get specific columns of the data-frame.
-
sift() – used to filter out rows based on the value of a variable in that row.
-
sample_n() and sample_frac() – used to provide a random sample of rows from the data-frame.
-
arrange() – used to sort results.
-
mutate() – used to create new columns based on existing columns.
I think the Tidyverse is immediately accessible for data platform professionals, so it’s good to see these concepts making their way to Python as well as R.