Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Post-1.0 Future #129

Closed
evanplaice opened this issue May 2, 2019 · 9 comments
Closed

The Post-1.0 Future #129

evanplaice opened this issue May 2, 2019 · 9 comments

Comments

@evanplaice
Copy link
Owner

evanplaice commented May 2, 2019

tl;dr: 1.0 is intended to be the final release of this library

The original intent of this lib was to provide a previously missing piece in the JS ecosystem. A fast, spec compliant CSV parser written to work in both browsers and Node. jquery-csv has achieved and exceeded all those goals.

But... The JS ecosystem has gone through dramatic changes since then. ES has advanced as a language, with better syntax capabilities, ES modules, iterators, jquery is no longer popular, etc...

With ESM support finally landing in both browsers and Node I it's a good time to start fresh. Build a new CSV lib for the next decade that uses all the lessons learned here.

Step 1: Naming

While piggybacking on the popularity of jquery was a great choice back in 2011, that's no longer the case. A new library will need a new name. At a minimum the name shouldn't fall out of popularity over the long-term.

Finding good names on NPM is hard, and there are a LOT of good alternative CSV parsers available today. More competition means standing out and reaching critical mass will be even important the second time around.

The more I think about it the more I feel like the new library should be placed in its own org.

Step 2: API

API design is hard. I spent literally weeks mulling over how I was going to make this library both very easy to use and capable of being extended for damn near any use case. That process led to a breakthrough that eventually became the hooks API.

A good API is one that is already intuitive and familiar to users. I'm thinking like JSON.parse/stringify but with all the good parts included in this library (ex objects support, configurable delimiters, hooks, etc).

Step 3: Restructuring

There is a lot of room for optimization here. Ideally, the following should reduce the overall size up to 30%. Or more if unused parts can be tree-shaken out.

The main source should be broken up into ES modules. This is a no-brainer. Better maintainability, better tree-shake-ability. IIFE's were a horrible but necessary requirement to pre-ES JS.

It should contain one parser implementation. This lib contains 2 parsers, one for arrays, one for objects.

THB this sucks:

  • it's a pain to maintain 2 parsers
  • the 'objects' parser requires 2 passes
  • a second parser implementation is completely unnecessary

Much like how the onParseValue hook works, it should be possible to wrap that code with an internal hook that does the extra work required to transform the output into objects.

Step 4: KISS

Following the initial goals of this project. Keeping the implementation simple and fast is the ideal. Much of what is used here (ex parser/tests) should be able to be copied over almost verbatim.

Since there is no concern for introducing API breaks, it would be a good idea to reverse some of the 'bad decisions' that this lib has accumulated over time:

  • make lineEndings configurable and drop implicit line ending parsing
  • wrap both the data and options together for better compat with Nodes callback style
  • make option setting and type checking universal to reduce duplicate code
  • develop a better solution for loading test fixtures

Step 5: Extensions

Error Recovery

The parser should be able to recover after encountering an error. It has a state machine, if the state becomes dirty, keep going until a newline is encountered. Barring extremely rare use cases (ie the data uses a lot of newlines as values) it should pick back up where it left off after one-or-two bad entries.

Iterable Output

How about making the entry points iterable. The intent of jquery-csv was to make the parser object chain-able with jquery's pseudo-monad. In retrospect, that's probably a bad idea but it would be awesome if the parser object could be wired directly into a functional pipeline via map, filter, reduce.

@coltonehrman
Copy link
Contributor

How would you go about this? Is the library ready to move to version 1 and can start the process now, or are there other things that need to be taken of before implementing this major release?

@coltonehrman
Copy link
Contributor

I know there are still open bug issues, but I feel like trying to fix these in the old version is more of a pain than what it's worth. Once version 1 is on the way, it will be MUCH easier to implement these bug fixes that were introduced from the older versions.

@evanplaice
Copy link
Owner Author

evanplaice commented May 2, 2019

I'm going through the 1.0 requirements and removing all the 'nice to have' stuff

I'm thinking of creating a new org. There's no reason for it to be attached to my name, I'd prefer shared ownership.


As far as bugs/features. Most of the existing bugs are so out-of-date it's not realistic to try to recreate them now. With the exception of these 2:

#58

This is an API breaking fix that should really be added to the new implementation. Currently, objects.onParseEntry() spits out the string version of the entry being parsed. The hook needs to be moved down in the execution order so the output object can be mutated prior to being added to the output.

Likewise, arrays.onParseEntry() should provide an array.

Since this is currently the 'expected behavior' -- and it's API breaking -- I'm fine with leaving it as is in this lib and fixing it in the new one.

#51

The issue is that hooks return false to indicate a no-op. The problem is, if a user provides a hook where the expected return value is false the parser will skip the value. Since JS has stupid rules for false/undefined comparisons I think making the default value for hooks (ie no-op) to equal null then checking for that prior to parsing would be a fix.

This definitely needs to be fixed in the new lib. Despite being a very rare edge case, it breaks expected behavior.


So to start:

  • establish a new org
  • create a roadmap outlining where the work should start
  • scaffold a lot of the project maintenance cruft (ex tools/scripts/deps/templates)
  • start issues discussing the API

Update

Adding these tests prior to 1.0 would be ideal. To verify that these features currently work and because they will be required in the new lib.

  • Add onPreParse() Test
  • Add onParseEntry() Test
  • Add onParseValue() Test

@evanplaice
Copy link
Owner Author

Any suggestions on a name for the org? I'm thinking CSV-js. That's what the JSON module was called before it became a standard feature in JS.

@coltonehrman
Copy link
Contributor

I like that, or perhaps something along the lines of simple-csv that way it can be "marketed" as the small and easy CSV library and not really compete with your bigger libraries like https://csv.js.org/ or https://www.papaparse.com/

@evanplaice
Copy link
Owner Author

Hmm. The downside is every variation of light/simple-csv is already taken by another lib.

I'm thinking bigger scope. Ie name the org 'Type3', referring to Chomsky Type III parsers (ie regular languages). Then just call the repo 'csv'. Keeps it short/sweet and the org can host other regular parsers, including jquery-csv.

@evanplaice
Copy link
Owner Author

What do you think?

@coltonehrman
Copy link
Contributor

I agree with this. Whatever you think will work best for the library :)

@evanplaice
Copy link
Owner Author

Awesome. It's created. I couldn't add you as an owner because orgs enable billing even for the free tier but 'member' access should allow just about everything.

I'm going to close this issue so the discussion can continue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants