
Recently I stumbled over the task to read either CSV or Excel files and transfer data in an independent format to a server.
The existing software part implemented to similar approaches to
- Read the file
- Transfer the file into an intermediate format (a DataTable)
- Transfer the file up to the server
The server took the file, read all data, translate number according to a given number format sent by the client, …
Reviewing the code resulted into a couple of drawbacks: redundant code, performance, use of language depending information on the server side, etc.
As a result I looked for an approach to
- combine CSV and excel load within one flow and
- have data translated on the client side into a format which can be read by the server without knowledge of the client user language (or the date format used for loading files)
Based on a code example for IDataReader (http://blogs.msdn.com/b/anthonybloesch/archive/2013/01/23/bulk-loading-data-with-idatareader-and-sqlbulkcopy.aspx) I had the idea to create a library which streams data independent of the input format.
The result is now on GitHub. A library which results into a simple interface to read data:
IStreamImporter streamImporter = new CsvStreamImporter(streamdata); ((CsvStreamImporter)streamImporter).SetDelimiter(';'); // or ExcelPackage package = new ExcelPackage(); package.Load(excelFile); ExcelStreamImporter streamImporter = new ExcelStreamImporter(package.Workbook.Worksheets.First(), true); streamImporter.SetHeader(true); streamImporter.AddColumnDefinition("Field1", typeof(string)); streamImporter.AddColumnDefinition("Field2", typeof(int)); streamImporter.AddColumnDefinition("Field3", typeof(string)); while (streamImporter.Read()) { string column1 = streamImporter.GetValue(0) as string; int column2 = streamImporter.GetValue(2) as int; string column3 = streamImporter.Getvalue(3) as string; // ... do work }
The current status of the project is some kind of work in progress. Anyhow, NuGet packages are available from https://www.myget.org/F/stream-importer/ (prerelease versions only).
The following is still to be done:
- Documentation
- Examples
- Replace EPPlus Excel library for xlsx files by XmlReader to reduce memory footprint
Anyhow, the library seems to be stable and is used in a production environment.