Gilbert Quevauvilliers writes a SQL statement:
I come from a TSQL background, so using SQL makes it easy for me to work with data.
There are multiple ways to use SQL in a PySpark notebook, and when I started using a Python notebook it was not so straightforward.
In this blog post I will show you how I use SQL Code.
As mentioned previously I am by no means an expert, I typically find a way that works, is fast and doesn’t consume a lot of capacity. If that works consistently for me then that is how I go about it.
Click through for the solution, which uses DuckDB. As such, the SQL syntax isn’t T-SQL—it’s more like psql. But it does do a great job of interacting with Parquet files and Delta tables.