Combining Files In C#

Chris Koester shows how to combine a set of CSVs without duplicating their header rows:

The timings in this post came from combining 8 csv files with 13 columns and a combined total of 9.2 million rows.

I first tried combining the files with the PowerShell technique described here. It was painfully slow and took an hour and a half! This is likely because it is deserializing and then serializing every bit of data in the files, which adds a lot of unnecessary overhead.

Next I tried the C# script below using LINQPad. When reading from and writing to a network share, it took 3 minutes and 56 seconds. Much better! Next I tried it on a local SSD drive and it took just 44 seconds.

Read on for the script itself.  The ReadAllLines method works fine as long as the file isn’t larger than your working memory.

Related Posts

Why .NET And Java Have StringBuilders

Randolph West walks us through a performance troubleshooting issue with a twist: So we branch the the code in source control, and start writing a helper class to manage the data for us closer to the application. We throw in a SqlDataAdapter, use the Fill() method to bring back all the rows from the query in one go, and then […]

Read More

Adding IN Search Functionality To .NET

Jay Robinson shows off a few extension methods he creates to make dealing with C# easier: Then I could use the extension like this: if (mySeries.In(Enum.Series.ProMazda, Enum.Series.Usf2000)) myChassis = "Tatuus"; As for the other two methods, well… When is a null not a null? When it’s a System.DBNull.Value, of course! SQL Server pros who have […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031