-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for comment: CRI Version 1 #117
Comments
Some challenges with this for node-tap:
|
Interesting, I had not considered those as similar, but you're right. I'd support merging the "test" and "suite" concepts. It might require some compromises, but I think it's worth giving a try. Currently:
I have not previously seen testing frameworks that allow nesting within their equivalent of the "test" concept. I see that node-tap does allow this. As such, node-tap's tests are effectively a hybrid. Like a "suite" that can also directly contain assertions. The main issue I see with merging these concepts is that it might affect support for "planning" the execution of a run. Right now, we specify a structure that reporters can read that describes how many tests will execute and in which suites they are grouped, before execution begins. @isaacs How does this work with node-tap? The TAP spec defines a "plan" (spec), which Mocha, QUnit, and Jasmine effectively implement by waiting until all test files have been loaded, seeing what "suites" have been called, but not actually executing the "test" portions yet. It will then know this information, and then can start execution. I'm generally in favour of removing the structured plan (e.g. the whole suite structure) since this is afaik never used in practice by reporters. A stream-based approach seems preferable there. However an overall count of planned tests seems useful to keep, allowing reporters to track and communicate a sense of progress toward completion. |
ftr, tape also allows nesting tests (groups of assertions), and has no concept of "suites". In tape, you can In other words, tape simply doesn't have a full count of tests until after the full run is completed. |
Yeah, tape is sort of in the TAP family of test paradigms (along with ava, to some extent), so most of what's said about node-tap would apply to tape as well.
The tap runner does not load all the test files ahead of time. All tests are run as subprocesses to ensure a clean environment, and their TAP stream is consumed from the child process stdout. A pool of tests are loaded in parallel, by default up to the count of CPUs on the system. As tests finish, new ones are kicked off, but the output order is always consistent (if you specify Within a given test file (ie, in the child process), the object returned by
The top-level tap test object doesn't have to create subtests. This is a perfectly valid test script in node-tap: const t = require('node-tap')
t.pass('this is fine') If they call // all console.error calls will happen in order
const t = require('tap')
console.error('one')
t.test('child test', t => {
console.error('two')
t.pass('okie dokie')
console.error('three')
t.end()
console.error('four')
})
console.error('five')
t.test('second child', t => {
console.error('six')
t.pass('fine')
console.error('seven')
t.end()
console.error('eight')
})
console.error('nine') Will produce the output:
The runner works by getting a list of all the test files that it needs to run, and then calling
So, really, there is no difference between a "suite" and a "test", we don't often know ahead of time how many tests there will be, and tests can have subtests so really either there's no such thing as a "suite" or it's suites all the way down, and the objects that this spec talks about emitting events for one another are all in different process memory spaces. In the default reporter, I do report the number of "suites", but that's just the number of test files, which is an arbitrary distinction to draw. All the communication happens over text streams that speak and parse TAP. |
@isaacs Thank you, that helps! JS API
The CRI spec isn't meant to imply JavaScript. The repo's name and heritage do come from that background, though, that's fair. It's important that we (at least) succeed in the JS space to unify the many non-interopable test frameworks, runners, and reporters. But, like TAP, I think CRI could be environment agnostic, specifying only a stream of information. Join forces?TAP focusses on the transmission protocol for the (virtual) wire, CRI focusses more on how the data gets on and off that same wire. In TAP, the line protocol is concrete and the transmission method is left unspecified. In practice, I believe one generally expects to see separate proceses connected through a Unix pipe. When things are not inter-process, though, one could presumably agree unofifically on exposing the runner's writable stream, and exposing the reporter's consumer of the readable stream, and plug-in something other than stdin/stdout. But this is not standardised. CRI aims to close that gap. I love the Unix philosophy, and this makes perfect sense for Node.js and other CLI contexts, especially when there's a single test run, and a single reporter. So far, I've not had a good sense of how this can be made to work outside that context in a way that is still interoperable. For example:
In the CRI draft we've described what is essentially a callback mechanism or minimal event emitter, with an abstract method that should exist for connecting a reporter (consumer), which then gets called with the described structured data in a language-native fashion. Once everything is connected, the format of the data is currently a weak point in CRI. Perhaps we should move away from tryingto describe the structure in a language-neutral way and just say that it is "text", e.g. JSON-encoded, or even TAP. This would decoupled CRI and remove any ambiguity about how structures might be represented. if there's more than one way to do so in a given langage. It'd be up to the reporter to decide how they parse the stream. My dream for CRI is that any test reporter can be declaratively plugged into any test framework or test orchestrator (BrowserStack, Karma, ..). Thus for the many reporters that produce artefact files, upload stuff, pretty-print etc to be re-usable, and for HTML reporters to be re-usable as well. The below is JS-specific, but as concrete outcome for the JS ecosystem, I consider it sucess if we can:
I haven't considered TAP for this, but I am open to it. I do think we'd most likely still need something within this spec to adress the above, but we could certainly leverage TAP. I think that's worth pursuing. We may need to solve one or two issues upstream in that case, but I wouldn't mind helping with that! E.g. regarding nesting of tests, and (optional) test duration, and details of passed assertions. We might be able to do that as an extension in an ignored comment. @isaacs I see this that "extended TAP" is the approach I'm also welcome any thoughts or feedaback on the CRI challenge of connecting consumers to producers. MiscRegarding test plans - I didn't realize the test plan "header" could be at the end of a TAP stream. That makes a lot of sense (since TAP 12). Regarding nested tests - Yep, we need to support this. I consider it a requirement to be able to represent the structure of |
It seems OK, but it's hard to know without some kind of programmatic verification tooling/testing. |
Thanks for all the input. Continuing the conversation at #133. Next I'll see what we can do with Karma runner, BrowserStack, and in-browser reporting from frameworks the don't yet support TAP there today. |
I've turned everything we have so far into a draft specification:
https://github.com/js-reporters/js-reporters/blob/main/spec/cri-draft.adoc
To gain experience, it is currently used in stable releases or production versions of:
Objective
To publish Version 1 of the specification "soon".
This ticket is to track the review round during which we can answer any pending questions or past issues that we want to solve as part of Version 1. I think this area is interesting enough that there is definitely space for lots of ideas and future considerations. But, to deliver a long-term and stable Version 1, I think we need to limit our scope through agreed-upn criteria.
Proposed criteria:
runStart
event, akin to the TAP specification.These are admitedly quite strict and narrow. Maybe they're feasible, but if not feel free to suggest changes and widening. It's only meant as a way to start and focus the conversation.
Open questions
todo
field on the Assertion event data? Seetodo
property on assertions feels incorrect #105.assertions
. Maybe changeSuiteEnd
per Recommend purging actual/expected values of assertions #100? Or remove entirely, per Allow for tests nested within tests #126?Error
objects to be used as-is to communicate failed assertions. See Allow Error objects in TestEnd.error assertions list #123.The text was updated successfully, but these errors were encountered: