Querying a remote parquet file via HTTP with DuckDB.
The french statistical service (INSEE) has made available its first parquet file on data.gouv.fr in June.
It’s a 470 MB file (from a 1.8 GB CSV) with 16·106 rows, showing for each address in France which polling station it belongs to.
Click through for the code and results. The only thing which surprised me at all was that the performance was so fast for a remote file, unless I’m misunderstanding something. For a local file, I’d expect 16 million rows to complete in under 2 seconds for heavy aggregation on two columns in Parquet. H/T R-Bloggers.
Comments closed