Python And The Tidyverse

Leo at Locke Data looks at a couple Python packages which implement Tidyverse concepts:

The Dplython README provides some clear examples of how the package can be used. Below is an summary of the common functions:

  • select() – used to get specific columns of the data-frame.

  • sift() – used to filter out rows based on the value of a variable in that row.

  • sample_n() and sample_frac() – used to provide a random sample of rows from the data-frame.

  • arrange() – used to sort results.

  • mutate() – used to create new columns based on existing columns.

I think the Tidyverse is immediately accessible for data platform professionals, so it’s good to see these concepts making their way to Python as well as R.

Related Posts

Flattening JSON Data With Databricks

Ivan Vazharov gives us a Databricks notebook to parse and flatten JSON using PySpark: With Databricks you get: An easy way to infer the JSON schema and avoid creating it manually Subtle changes in the JSON schema won’t break things The ability to explode nested lists into rows in a very easy way (see the […]

Read More

Scatterplot Matrices

The Plotly folks show off scatterplot matrices in Python: The scatterplot matrix, known acronymically as SPLOM, is a relatively uncommon graphical tool that uses multiple scatterplots to determine the correlation (if any) between a series of variables. These scatterplots are then organized into a matrix, making it easy to look at all the potential correlations […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

June 2018
MTWTFSS
« May  
 123
45678910
11121314151617
18192021222324
252627282930