Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CROW.Q1 gives ii lines are low when 2 crows are on bus #464

Open
zjb-s opened this issue Apr 20, 2022 · 4 comments
Open

CROW.Q1 gives ii lines are low when 2 crows are on bus #464

zjb-s opened this issue Apr 20, 2022 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@zjb-s
Copy link

zjb-s commented Apr 20, 2022

It appears that this works fine with only 1 crow attached, but with 2, things blow up. Not sure if this should be a crow or TT issue - feel free to redirect.

@trentgill
Copy link
Collaborator

this is unsurprising, though obviously something we should fix.

Q: do both crows say "ii lines are low" or only 1? one of the crows should succeed, while the other loses arbitration (hence prints the unhelpful error). it's possible they both fail in which case it's a worse bug than I hope.

this is actually a great reproduction case for producing an arbitration loss which i always struggled to come up with a reliable "this message always causes bus contention" function. so thank you!


back to the matter at hand, i'm assuming both of the crows are assigned to the same ii address? if not, then only one of the devices should respond, and should not cause this issue.

question is, what should the expected behaviour be when you query two devices simultaneously that may or may not respond with the same value? i think the standard i2c behaviour is that one device succeeds, but you can't tell which one. as a result i would suggest that this is an anti-pattern as it results in undefined behaviour. i would suggest that the crows should be on different ii addresses for the query functions to make any sense.

of course we should fix the behaviour where things "blow up", but this is not a core bug (i wonder if the i2c spec says anything about multiple devices on the same address?).


finally a side note for anyone looking into this in the future: the i2c query code is the most 'dangerous' code in crow's main loop. while the C (Hardware) & Lua (REPL) layers always communicate via the 'event loop', the i2c-query mechanism sidesteps the event loop. the reasoning is that otherwise we'd be holding up the bus for some highly-variable amount of time while the query waits for it's turn to be processed in the loop. meanwhile the entire i2c-network is held up. this could be in the millisecond range when crow is very busy with other tasks.

because the i2c-query comes in on an interrupt, this can mean the query code calls into Lua while the VM is working on something else & lead to stack corruption, and inevitably a crash. i'm unsure if this has anything to do with the crash described in this issue, but just wanted to note it as possibly related.

a basic & hacky solution would be to force i2c queries to the front of the event queue, or add them as a separate event handler in the main loop. this would trade higher latency of queries for a more resilient query handler.

@trentgill trentgill added bug Something isn't working help wanted Extra attention is needed labels Apr 20, 2022
@zjb-s
Copy link
Author

zjb-s commented Apr 21, 2022

Looks like both crows crash on Q1, and they both double-report the lines are low error.

After running ii.set_address(2) on the beta-firmware-crow, things work as expected. Running that command on the vanilla 3.0.1 crow has no error, but also no effect (which I think @dndrks mentioned in discord)

My intuition says that with 2 crows on the bus, you might get confusing returns, but things shouldn't crash, and that users should be encouraged to assign addresses in their init functions to avoid confusion. I imagine there's no way to have them auto-allocate addresses?

I hope this is helpful!

poking around, it looks like there's no support for 2 devices on the same bus with the same address, though there are some very cheap i2c multiplexers that handle that type of routing. Not sure where that would fit into the ecosystem if at all.

@trentgill
Copy link
Collaborator

thanks for the report. sounds like some time with the logic analyzer is needed to determine when & where exactly the crash is occurring. clearly there's an error in the bus arbitration code as at least one device should survive the collision.

i agree that 2 devices (crow or otherwise) on the same address should not cause a crash. i think stopping this crash is the actionable element of this issue.

multiple devices on the same address is actually quite nice when you have multiples that you want to follow directives from a single leader. it's specifically querying that causes the issues & i think this should be classified as undefined behaviour (ie. we make no promises what will happen).

for the same reason i don't think we should auto-allocate addresses. in order for the separate addresses to be meaningful, the scripter needs to know which is which - as a result it should be an active decision in each script. if you are (for eg) trying to run the same script on 2 crows, but have them exist on different addresses, you can use the unique_id function and an if statement in init to dynamically assign in a predictable way.

@zjb-s would avoiding the crash be sufficient to close this issue?

@zjb-s
Copy link
Author

zjb-s commented Apr 21, 2022

I personally don't have any use for two crows following the same directive, but maybe you / someone else do/es!

For me, yeah totally, avoiding the crash now that I know it's there is fine. I can't speak for anyone else :)

I do think we should document the address-related functions, though. And making the unique_id function a bit clearer wouldn't hurt.

thanks for considering my perspective on this stuff!

@trentgill trentgill added this to the 3.1? 3.5? 4.0? milestone Feb 3, 2023
@trentgill trentgill removed this from the 3.1? 3.5? 4.0? milestone Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants