Press "Enter" to skip to content

Getting the Top N Results in a PySpark Notebook

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.