The Post-1.0 Future #129

evanplaice · 2019-05-02T16:01:31Z

tl;dr: 1.0 is intended to be the final release of this library

The original intent of this lib was to provide a previously missing piece in the JS ecosystem. A fast, spec compliant CSV parser written to work in both browsers and Node. jquery-csv has achieved and exceeded all those goals.

But... The JS ecosystem has gone through dramatic changes since then. ES has advanced as a language, with better syntax capabilities, ES modules, iterators, jquery is no longer popular, etc...

With ESM support finally landing in both browsers and Node I it's a good time to start fresh. Build a new CSV lib for the next decade that uses all the lessons learned here.

Step 1: Naming

While piggybacking on the popularity of jquery was a great choice back in 2011, that's no longer the case. A new library will need a new name. At a minimum the name shouldn't fall out of popularity over the long-term.

Finding good names on NPM is hard, and there are a LOT of good alternative CSV parsers available today. More competition means standing out and reaching critical mass will be even important the second time around.

The more I think about it the more I feel like the new library should be placed in its own org.

Step 2: API

API design is hard. I spent literally weeks mulling over how I was going to make this library both very easy to use and capable of being extended for damn near any use case. That process led to a breakthrough that eventually became the hooks API.

A good API is one that is already intuitive and familiar to users. I'm thinking like JSON.parse/stringify but with all the good parts included in this library (ex objects support, configurable delimiters, hooks, etc).

Step 3: Restructuring

There is a lot of room for optimization here. Ideally, the following should reduce the overall size up to 30%. Or more if unused parts can be tree-shaken out.

The main source should be broken up into ES modules. This is a no-brainer. Better maintainability, better tree-shake-ability. IIFE's were a horrible but necessary requirement to pre-ES JS.

It should contain one parser implementation. This lib contains 2 parsers, one for arrays, one for objects.

THB this sucks:

it's a pain to maintain 2 parsers
the 'objects' parser requires 2 passes
a second parser implementation is completely unnecessary

Much like how the onParseValue hook works, it should be possible to wrap that code with an internal hook that does the extra work required to transform the output into objects.

Step 4: KISS

Following the initial goals of this project. Keeping the implementation simple and fast is the ideal. Much of what is used here (ex parser/tests) should be able to be copied over almost verbatim.

Since there is no concern for introducing API breaks, it would be a good idea to reverse some of the 'bad decisions' that this lib has accumulated over time:

make lineEndings configurable and drop implicit line ending parsing
wrap both the data and options together for better compat with Nodes callback style
make option setting and type checking universal to reduce duplicate code
develop a better solution for loading test fixtures

Step 5: Extensions

Error Recovery

The parser should be able to recover after encountering an error. It has a state machine, if the state becomes dirty, keep going until a newline is encountered. Barring extremely rare use cases (ie the data uses a lot of newlines as values) it should pick back up where it left off after one-or-two bad entries.

Iterable Output

How about making the entry points iterable. The intent of jquery-csv was to make the parser object chain-able with jquery's pseudo-monad. In retrospect, that's probably a bad idea but it would be awesome if the parser object could be wired directly into a functional pipeline via map, filter, reduce.

The text was updated successfully, but these errors were encountered:

coltonehrman · 2019-05-02T16:57:02Z

How would you go about this? Is the library ready to move to version 1 and can start the process now, or are there other things that need to be taken of before implementing this major release?

coltonehrman · 2019-05-02T16:58:58Z

I know there are still open bug issues, but I feel like trying to fix these in the old version is more of a pain than what it's worth. Once version 1 is on the way, it will be MUCH easier to implement these bug fixes that were introduced from the older versions.

evanplaice · 2019-05-02T17:22:00Z

I'm going through the 1.0 requirements and removing all the 'nice to have' stuff

I'm thinking of creating a new org. There's no reason for it to be attached to my name, I'd prefer shared ownership.

As far as bugs/features. Most of the existing bugs are so out-of-date it's not realistic to try to recreate them now. With the exception of these 2:

#58

This is an API breaking fix that should really be added to the new implementation. Currently, objects.onParseEntry() spits out the string version of the entry being parsed. The hook needs to be moved down in the execution order so the output object can be mutated prior to being added to the output.

Likewise, arrays.onParseEntry() should provide an array.

Since this is currently the 'expected behavior' -- and it's API breaking -- I'm fine with leaving it as is in this lib and fixing it in the new one.

#51

The issue is that hooks return false to indicate a no-op. The problem is, if a user provides a hook where the expected return value is false the parser will skip the value. Since JS has stupid rules for false/undefined comparisons I think making the default value for hooks (ie no-op) to equal null then checking for that prior to parsing would be a fix.

This definitely needs to be fixed in the new lib. Despite being a very rare edge case, it breaks expected behavior.

So to start:

establish a new org
create a roadmap outlining where the work should start
scaffold a lot of the project maintenance cruft (ex tools/scripts/deps/templates)
start issues discussing the API

Update

Adding these tests prior to 1.0 would be ideal. To verify that these features currently work and because they will be required in the new lib.

Add onPreParse() Test
Add onParseEntry() Test
Add onParseValue() Test

evanplaice · 2019-05-02T17:26:42Z

Any suggestions on a name for the org? I'm thinking CSV-js. That's what the JSON module was called before it became a standard feature in JS.

coltonehrman · 2019-05-02T17:54:46Z

I like that, or perhaps something along the lines of simple-csv that way it can be "marketed" as the small and easy CSV library and not really compete with your bigger libraries like https://csv.js.org/ or https://www.papaparse.com/

evanplaice · 2019-05-04T01:51:51Z

Hmm. The downside is every variation of light/simple-csv is already taken by another lib.

I'm thinking bigger scope. Ie name the org 'Type3', referring to Chomsky Type III parsers (ie regular languages). Then just call the repo 'csv'. Keeps it short/sweet and the org can host other regular parsers, including jquery-csv.

evanplaice · 2019-05-04T01:52:24Z

What do you think?

coltonehrman · 2019-05-06T23:07:02Z

I agree with this. Whatever you think will work best for the library :)

evanplaice · 2019-05-07T20:55:04Z

Awesome. It's created. I couldn't add you as an owner because orgs enable billing even for the free tier but 'member' access should allow just about everything.

I'm going to close this issue so the discussion can continue there.

evanplaice added the discussion label May 2, 2019

evanplaice mentioned this issue May 2, 2019

Added check to allow callback be sent to options argument #128

Merged

evanplaice closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Post-1.0 Future #129

The Post-1.0 Future #129

evanplaice commented May 2, 2019 •

edited

Loading

coltonehrman commented May 2, 2019

coltonehrman commented May 2, 2019

evanplaice commented May 2, 2019 •

edited

Loading

evanplaice commented May 2, 2019

coltonehrman commented May 2, 2019

evanplaice commented May 4, 2019

evanplaice commented May 4, 2019

coltonehrman commented May 6, 2019

evanplaice commented May 7, 2019

The Post-1.0 Future #129

The Post-1.0 Future #129

Comments

evanplaice commented May 2, 2019 • edited Loading

Step 1: Naming

Step 2: API

Step 3: Restructuring

Step 4: KISS

Step 5: Extensions

coltonehrman commented May 2, 2019

coltonehrman commented May 2, 2019

evanplaice commented May 2, 2019 • edited Loading

evanplaice commented May 2, 2019

coltonehrman commented May 2, 2019

evanplaice commented May 4, 2019

evanplaice commented May 4, 2019

coltonehrman commented May 6, 2019

evanplaice commented May 7, 2019

evanplaice commented May 2, 2019 •

edited

Loading

evanplaice commented May 2, 2019 •

edited

Loading