Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthy connections get destroyed during ice candidate negotiation #590

Open
welldan97 opened this issue Feb 4, 2020 · 11 comments
Open

Comments

@welldan97
Copy link

Looks like this line here sometimes destroys even healthy connections!

this.destroy(makeError('Connection failed.', 'ERR_CONNECTION_FAILURE'))

It happens when connection temporary fails and ICE is still negotiating candidates.

A little intro

  1. Connection in "failed" state stands for:

One or more of the ICE transports on the connection is in the "failed" state.
(https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/connectionState)

But it actually also happen when iceConnectionState is "disconnected".

  1. iceConnectionState in "disconnected" state means:

Checks to ensure that components are still connected failed for at least one component of the RTCPeerConnection. This is a less stringent test than "failed" and may trigger intermittently and resolve just as spontaneously on less reliable networks, or during temporary disconnections. When the problem resolves, the connection may return to the "connected" state.
(https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/iceConnectionState)

And what I've realized right now, is that actually even "failed" iceConnectionState can be recovered. So may be this line could be also somehow improved as well:

this.destroy(makeError('Ice connection failed.', 'ERR_ICE_CONNECTION_FAILURE'))

How to reproduce

So I used chrome & safari browsers. And checked connection from 2 macbooks, one connected to 3G another to wifi. I pass signals(offers/answers) manually via messenger

I've used turn& stun servers from Twilio as configuration

Ok. So first I create Peer with initiator:true, trickle: false, on first mac. I get offer with candidates.

On another mac then I create Peer with initiator: false, trickle: false. And then I pass offer this.peer.signal(offer). If I'll pass answer back to initiator fast enough, connection get created. But I wait for around 15 seconds & then connection get destroyed, because
iceConnectionState switches to "disconnected" or to "failed". And that's what is fishy.
The interesting thing is that if I comment out mentioned lines then I am able to create connection even when it's in "failed" state.

Thoughts

So this makes me think that "failed" or "disconnected" states for iceCandidates and "failed" state for connection itself don't mean too much. It's not a failure yet, I guess, since it can be reestablished so easily, it's more "on hold" type of state.

I'm not really sure how the issue can be fixed, though since I'm pretty new to WebRTC myself. And all these states are confusing. It needs further experimentation

Thank you!

@nazar-pc
Copy link
Collaborator

nazar-pc commented Feb 4, 2020

Can you try to replicate this between Chrome/Chrome Chrome/Firefox Firefox/Firefox?
I have a sneaking suspicion this is one more quirk in Safari implementation, but would be nice to have more data.

@welldan97
Copy link
Author

welldan97 commented Feb 4, 2020

Ok!
So I've tried chrome to chrome connection.
And it behaves almost the same. Also fails if to wait too long. And it can be connected if comment out these destruction lines.

This time I have different errors though.

So for chrome(initator) <-> chrome connection the error is:
[18997b5] destroy (error: Connection failed.)

The state right before the fail:

{
  iceGatheringState: "complete",
  iceConnectionState: "disconnected",
  connectionState: "failed",
  signalingState: "stable"
}

For chrome(initiator) <-> safari the error is:

[4c824a5] destroy (error: Ice connection failed.)

The state before fail:

{
  iceGatheringState: "complete",
  iceConnectionState: "failed",
  connectionState: "connecting",
  signalingState: "stable"
}

I saw both these problems in chrome/safari pair though, as per issue.

@welldan97
Copy link
Author

I've made a fork with a condition for this._connected welldan97@7a8cbe2

This works for me, so far. I bet this would be also better for many cases.

But probably it doesn't cover all the cases good enough: now I think that even once connected it could reach these strange recoverable states.

@nazar-pc
Copy link
Collaborator

nazar-pc commented Feb 4, 2020

The latest information I know is that SDP is only valid for about 30 seconds if the spec didn't change. So you can't really store it and if you wait for too long connection will fail and this is expected.

@t-mullen do you remember why those were added? It is possible there was some kind of quick in one of the browsers that it was necessary to fail the whole connection.

Usually though this will happen automatically and you can retry quickly, so shouldn't be a big issue. But the problem is interesting nonetheless.

Also did you verify that connection actually works, meaning you can send audio, video or other data over it?

@t-mullen
Copy link
Collaborator

t-mullen commented Feb 4, 2020

connectionState == 'failed' implies iceConnectionState == 'failed'. These signal fatal errors in one of the underlying ICE transports. Some parts of your connection may still work (since you can have multiple ICE transports). but the overall peer is broken.

These states are equivalent. The reason we listen for both is due to a bug in Chrome causing the oniceconnectionstatechange event not being fired.

connectionState == 'disconnected' is different and isn't fatal. We don't tear the peer down for this. This state doesn't seem relevant to your issue.

Are you sure the connection is usable when iceConnectionState == 'failed'?

@t-mullen
Copy link
Collaborator

t-mullen commented Feb 4, 2020

ICE failures are sometimes recoverable using an ICE restart, a procedure not implemented in simple-peer yet. See #579

Performing an ICE restart is recommended when iceConnectionState transitions to "failed".

If you can confirm the peer datachannel and any audio/video streams still work after ICE failure, I can look more into this.

@welldan97
Copy link
Author

welldan97 commented Feb 10, 2020

@nazar-pc So today I've made some experiments.

Unfortunately this time 4g Mac could not connect to wifi Mac at all. That was strange.

Then I've tried to connect to my friend in Russia. So the set up was like this:

Initiator: Me(Chrome, Macbook, Portugal)
Opponent: My friend(Chrome, Macbook, Russia)

So we tried several experiments, each time same scenario:

  1. I initiate connection,
  2. I send offer via messenger.
  3. And then he sends me answer back
  4. I wait for X min
  5. and then I establish connection.
  6. try sending messages via chat

These are the results:

X/delay - result:

≈0 min  - success,
≈0.5 min - success,
≈1 min - success,
≈2 min - failure,
≈2.5 min - failure,
≈5 min - failure,

I've tried each delay only once.


Another thing I've tried is waiting before sending the offer.
I've waited for 2 mins after it creation and it was still successful connection.


Thanks for inputs @t-mullen . I'm happy to learn more about WebRTC, and simple-peer
As for now, though, I'm more interested in establishing connection first.

So regarding this, to summarize. According my experiments Peer.js kills connection in around 15 seconds, when the connection officially dies. But if peer.js would not kill a connection,a user could still recover connection if they connect within 1 min. And that's a huge difference when one send signaling manually via messenger!

I see that with signaling server it's a no issue at all. But, I'm building an experimenting project, and I want to see how far the p2p can be pushed with WebRTC. I want to have an ability to establish connection manually - this way one need even less servers.

I think that could be useful to other users as well, may be as opt-in.

What do you think?

@t-mullen
Copy link
Collaborator

t-mullen commented Feb 10, 2020

Having long delays in signalling is not something that can be done reliably. The ICE candidates contained in the signal messages are time-sensitive.

The reason for this isn't due to simple-peer or even WebRTC, but your router. Most routers have timeouts on address mappings, which means the connection information in the server-reflexive ICE candidates expires.

https://en.wikipedia.org/wiki/Network_address_translation
https://www.rfc-editor.org/rfc/pdfrfc/rfc4787.txt.pdf

The only guarantee the spec gives you is this (keep in mind not every router follows the spec).

A NAT UDP mapping timer MUST NOT expire in less than two minutes.

Although it would be nice if we could just connect whenever we wanted with an IP and port, the reality of NATs means that's impossible in the general case.

@fippo
Copy link

fippo commented Feb 15, 2020

Most routers have timeouts on address mappings, which means the connection information in the server-reflexive ICE candidates expires.

WebRTC implementations should periodically ping the STUN server to keep the mapping open, see https://tools.ietf.org/html/rfc5389#section-6. If that isn't the case, please file a bug against the implementation!

@michaelpalumbo
Copy link

Most routers have timeouts on address mappings, which means the connection information in the server-reflexive ICE candidates expires.

WebRTC implementations should periodically ping the STUN server to keep the mapping open, see https://tools.ietf.org/html/rfc5389#section-6. If that isn't the case, please file a bug against the implementation!

I'm also experiencing the issue that @welldan97 has laid out. If I wanted to ping the STUN server in the simple-peer implementation, how would I do so?

@kk-007
Copy link

kk-007 commented Jul 26, 2020

it looks funny but i have the same issue and i solved it by changing internet connection. when i switch to mobile data from a broadband project run well and when i again switch back to broadband it gives that error. my project: https://github.com/kk-007/webrtc-video-call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants