Fix csv parsing #531

Vindaar · 2021-10-26T20:54:45Z

This is not really related to #530. In addition to that issue the parsing of CSV files has been broken by the laser backend PR, due to the change away from the Fdata field name. That name was still in use for the CSV parser.

While changing this I stumbled over some issues with how to handle the parsing now. Due to having non seq[T] based tensors for mem copyable types, we would have to always copy from the parsed seq[T] data to a tensor. I thought the better solution would be to instead walk the file once first to determine the number of lines and columns and then already construct an appropriate tensor. Then just fill it in a second pass using the CSV parser. Ideally, we should parse directly using the memfiles interface, but that's a lot more work.

From the datamancer CSV parser (which uses memfiles) I know that checking for line count before starting to parse is fast enough (it pays off compared to reallocations).

Otherwise we would have to copy the `seq` we parse into for mem copyable types after parsing.

Vindaar · 2021-10-26T20:58:28Z

Ugh, I just realized I forgot to take quoting into account for the counting of lines. That makes the code a bit more annoying as we then have to check each character, check if we find a quote, disable line counting in the quote and disable at the next quote etc...

Vindaar added 5 commits October 25, 2021 20:40

change CSV parser to directly parse into a Tensor

a1522b5

Otherwise we would have to copy the `seq` we parse into for mem copyable types after parsing.

[io] replace CSVParser based line counter by memfiles counter

c34349b

[io] add simple readCsv test & add note to docstring about mratsim#530

42967f6

[io] extend tests with semicolon example, fixup empty line test

33eef5c

[io] remove TODO note

b05e267

Vindaar added 2 commits October 27, 2021 22:05

fix line counting for quoted fields in CSV files

ebb2d6e

[tests] add test case for quoted field in CSV file

fd8ec5a

Vindaar merged commit 649e42b into mratsim:master Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix csv parsing #531

Fix csv parsing #531

Vindaar commented Oct 26, 2021

Vindaar commented Oct 26, 2021

Fix csv parsing #531

Fix csv parsing #531

Conversation

Vindaar commented Oct 26, 2021

Vindaar commented Oct 26, 2021