Combining Files In C#

Chris Koester shows how to combine a set of CSVs without duplicating their header rows:

The timings in this post came from combining 8 csv files with 13 columns and a combined total of 9.2 million rows.

I first tried combining the files with the PowerShell technique described here. It was painfully slow and took an hour and a half! This is likely because it is deserializing and then serializing every bit of data in the files, which adds a lot of unnecessary overhead.

Next I tried the C# script below using LINQPad. When reading from and writing to a network share, it took 3 minutes and 56 seconds. Much better! Next I tried it on a local SSD drive and it took just 44 seconds.

Read on for the script itself.  The ReadAllLines method works fine as long as the file isn’t larger than your working memory.

Related Posts

Java With Visual Studio Code

Niels Berglund learns about Visual Studio Code and writing Java in VS Code: I mentioned above how Maven is a build automation tool for primarily Java projects. There are other build tools as well, but in this post, I use Maven as it is – which I mentioned above – the de-facto standard for Java-based […]

Read More

LISTAGG In Snowflake DB

Koen Verbeeck continues investigating Snowflake capabilities: Since SQL Server 2017, you have the STRING_AGG function, which has almost the exact same syntax as its Snowflake counterpart. There are two minor differences:– Snowflake has an optional DISTINCT– SQL Server has a default ascending sorting. If you want another sorting, you can specify one in the WITHIN GROUP clause. […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031