Combining Files In C#

Chris Koester shows how to combine a set of CSVs without duplicating their header rows:

The timings in this post came from combining 8 csv files with 13 columns and a combined total of 9.2 million rows.

I first tried combining the files with the PowerShell technique described here. It was painfully slow and took an hour and a half! This is likely because it is deserializing and then serializing every bit of data in the files, which adds a lot of unnecessary overhead.

Next I tried the C# script below using LINQPad. When reading from and writing to a network share, it took 3 minutes and 56 seconds. Much better! Next I tried it on a local SSD drive and it took just 44 seconds.

Read on for the script itself.  The ReadAllLines method works fine as long as the file isn’t larger than your working memory.

Related Posts

Domain, Range, And Codomain

Kevin Sookocheff explains the concepts of domain, range, and codomain: That is, a function relates an input to an output. But, not all input values have to work, and not all output values. For example, you can imagine a function that only works for positive numbers, or a function that only returns natural numbers. To more […]

Read More

Running Cassandra On EC2

Prasad Alle and Provanshu Dey share some tips if you’re running Cassandra on Amazon’s EC2: Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2. Amazon EC2 and Amazon Elastic Block Store […]

Read More


January 2017
« Dec Feb »