Conversation
Collaborator
tjohn327
commented
Mar 17, 2026
- SCION path-aware networking in magicsock (embedded daemon, bootstrap, path selection)
- SCIONPathInfo in PeerStatus, ReconfigureSCION/SCIONStatus APIs
- scion-status LocalAPI endpoint
- Cross-platform release workflow (Linux deb/rpm/tgz, macOS, Windows)
- Package renamed to tailscale-scion with Conflicts/Replaces
- NOTICE file and BSD-3-Clause license correction
…ling and address quality evaluation
… endpoint discovery
… with preference handling
… with scion mock dependencies
…and improve path registration locking - Added support for piggybacking SCION service information in the peerapi4 Description field. - Updated path registration and lookup methods to ensure thread safety with locking. - Enhanced tests to validate new SCION service extraction logic and path registration behavior.
- Enhanced logic for determining when to send full pings and disco pings based on SCION path characteristics. - Updated MTU probing logic to account for SCION paths, ensuring proper handling of payload sizes. - Refined address quality evaluation in tests to reflect the new preference for SCION over direct UDP connections. - Improved logging to include MTU information when discovering SCION paths.
- Implemented throttled re-discovery for SCION paths to improve responsiveness when paths expire. - Added cleanup of old SCION path entries outside of critical sections to prevent deadlocks. - Introduced a constant for assumed per-hop latency when SCION reports LatencyUnset, improving path latency calculations. - Updated metrics tracking for SCION disco messages to better reflect usage patterns.
…et buffer sizes - Replaced the use of Listen with OpenRaw to allow setting custom UDP socket buffer sizes. - Increased the read and write buffer sizes to 7 MB to prevent packet drops at high throughput. - Wrapped the raw connection with NewCookedConn for enhanced SCION connection management.
…ng methods - Added logic to ensure SCION paths are pinged during heartbeat even when a low-latency direct path is preferred. - Updated discoPing method to include SCION pings for peers when available, improving path competition and responsiveness.
…logic - Introduced mechanisms to detect dead SCION sockets and trigger reconnections based on packet reception time. - Added constants for read deadlines and reconnection thresholds to enhance socket reliability. - Enhanced the receiveSCION function to handle read timeouts and errors gracefully without propagating them to WireGuard. - Implemented path re-discovery for active SCION peers upon reconnection to ensure updated routing.
…on addresses - Introduced a cached destination address in scionPathInfo to optimize path resolution. - Updated writeTo and sendSCIONBatch methods to utilize cached destination for improved performance. - Refactored lastSCIONRecv to use monotonic time for better performance - Ensured buildCachedDst is called during path updates to maintain cache consistency.
…ialization - Added functions for computing the SCION pseudo-header checksum and finishing the checksum for SCION/UDP packets. - Introduced a pre-serialized header template for fast-path sends to optimize performance by bypassing standard serialization. - Enhanced the scionConn structure to support fast-path operations, including adjustments to the underlay connection handling. - Updated tests to validate the correctness of the new checksum computations and fast-path functionality.
tailscale#16450) Adds logic for containerboot to signal that it can't auth, so the operator can reissue a new auth key. This only applies when running with a config file and with a kube state store. If the operator sees reissue_authkey in a state Secret, it will create a new auth key iff the config has no auth key or its auth key matches the value of reissue_authkey from the state Secret. This is to ensure we don't reissue auth keys in a tight loop if the proxy is slow to start or failing for some other reason. The reissue logic also uses a burstable rate limiter to ensure there's no way a terminally misconfigured or buggy operator can automatically generate new auth keys in a tight loop. Additional implementation details (ChaosInTheCRD): - Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that `n.Health` is populated when notify's are returned. - on auth failure, containerboot: - Disconnects from control server - Sets reissue_authkey marker in state Secret with the failing key - Polls config file for new auth key (10 minute timeout) - Restarts after receiving new key to apply it - modified operator's reissue logic slightly: - Deletes old device from tailnet before creating new key - Rate limiting: 1 key per 30s with initial burst equal to replica count - In-flight tracking (authKeyReissuing map) prevents duplicate API calls across reconcile loops Updates tailscale#14080 Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f (cherry picked from commit 969927c) Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com> Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
Fix three independent flake sources, at least as debugged by Claude, though empirically no longer flaking as it was before: 1. Poll for connection counter data instead of reading immediately. The conncount callback fires asynchronously on received WireGuard traffic, so after counts.Reset() there is no guarantee the counter has been repopulated before checkStats reads it. Use tstest.WaitFor with a 5s timeout to retry until a matching connection appears. 2. Replace the *2 symmetry assumption in global metric assertions. metricSendUDP and friends are AggregateCounters that sum per-conn expvars from both magicsock instances. The old assertion assumed both instances had identical packet counts, which breaks under asymmetric background WireGuard activity (handshake retries, etc). The new assertGlobalMetricsMatchPerConn computes the actual sum of both conns' expvars and compares against the AggregateCounter value. 3. Tolerate physical stats being 0 when user metrics are non-zero. A rebind event replaces the socket mid-measurement, resetting the physical connection counter while user metrics still reflect packets processed before the rebind. Log instead of failing in this case. Also move counts.Reset() after metric reads and reorder the reset sequence (counts before metrics) to minimize the race window. Fixes tailscale#13420 Change-Id: I7b090a4dc229a862c1a52161b3f2547ec1d1f23f Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
ReadFromUDPAddrPort worked if UDP GRO was unsupported, but we don't actually want attempted usage, nor does any exist today. Future work on tailscale/corp#37679 would have required more complexity in this method, vs clarifying the API intents. Updates tailscale/corp#37679 Signed-off-by: Jordan Whited <jordan@tailscale.com>
…nge (tailscale#18974) In TestUserspaceEnginePortReconfig, when selecting a port, use a random offset rather than searching in a continguous range in case there is a range that is blocked Updates tailscale#2855 Signed-off-by: kari-ts <kari@tailscale.com>
After switching from cellular to wifi without ipv6, ForeachInterface still sees rmnet prefixes, so HaveV6 stays true, and magicsock keeps attempting ipv6 connections that either route through cellular or time out for users on wifi without ipv6 This: -Adds SetAndroidBindToNetworkFunc, a callback to bind the socket to the selected Android Network object Updates tailscale#6152 Signed-off-by: kari-ts <kari@tailscale.com>
Add two small APIs to support out-of-tree projects to exchange custom signaling messages over DERP without requiring disco protocol extensions: - OnDERPRecv callback on magicsock.Options / wgengine.Config: called for every non-disco DERP packet before the peer map lookup, allowing callers to intercept packets from unknown peers that would otherwise be dropped. - SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a node key via a DERP region, creating the connection if needed. Thin wrapper around the existing internal sendAddr. Also allow netstack.Start to accept a nil LocalBackend for use cases that wire up TCP/UDP handlers directly without a full LocalBackend. Updates tailscale/corp#24454 Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6.0.0 to 7.0.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@b7c566a...bbbca2d) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7.0.0 to 8.0.0. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@37930b1...70fc10c) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.5 to 4.32.6. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@c793b71...0d579ff) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 4.32.6 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…ments - Introduced scionRecvBatch for efficient batch processing of SCION packets, utilizing a sync.Pool for buffer reuse. - Added parseSCIONPacket function to extract source address and payload from raw SCION packets, improving packet handling. - Enhanced receiveSCION method to support batch reading from the underlay socket, optimizing performance during packet reception. - Updated logic for handling disco packets to leverage the new batch processing capabilities.
…sponding tests - Enhanced the addNewSCIONPathsForPeer method to initialize scionState for endpoints when initial path discovery fails. - Implemented logic to register new SCION paths and ensure proper recovery of scionState, allowing for effective disco probing. - Added a new test, TestScionAddNewPathsRecovery, to verify the correct initialization of scionState and path management during recovery scenarios. - Improved overall robustness of SCION path handling in the presence of failed initial discoveries.
- Replaced direct use of t.Setenv with envknob.Setenv for setting the TS_SCION_PORT environment variable in tests. - Added cleanup logic to reset the environment variable after each test, ensuring isolation between test cases.
… constants to make omit scion work
Add SCIONPathInfo struct to ipnstate with path description, active status, health, latency, expiry, and MTU fields. Populate it from endpoint's scionState in populatePeerStatus via a new build-tagged helper method populateSCIONPathsLocked.
…onfig Add SCIONConfig struct and two methods on *Conn: - ReconfigureSCION: updates envknobs and triggers reconnection - SCIONStatus: returns whether SCION is connected and local IA
GET /localapi/v0/scion-status returns SCION connection status and local ISD-AS number. Stub omit file for ts_omit_scion builds.
Close existing SCION connection and set TS_SCION_FORCE_BOOTSTRAP before retrying, so that config changes from the Android UI always trigger a real reconnection attempt.
When SCION connects mid-session (e.g. via ReconfigureSCION from the Android UI), the receive goroutines were never started because receiveFuncs() only included SCION functions if pconnSCION was non-nil at Open() time. Fix: always register receiveSCION and receiveSCIONShim in the receive func list. When pconnSCION is nil, they poll every 5 seconds instead of blocking forever on donec. Once SCION connects, they pick up the new connection and start processing packets.
closeSCIONLocked was closing the socket but leaving pconnSCION pointing at the closed conn. This caused panics when toggling SCION off then on, as retrySCIONConnect saw a non-nil (but closed) connection and returned early, or receive goroutines tried to read from the closed socket.
populateSCIONPathsLocked was returning stale path data from scionState even after SCION was disabled and pconnSCION set to nil. Check pconnSCION first and return empty paths when disconnected.
…nnect After retrySCIONConnect succeeds: 1. discoverNewSCIONPeers: scans all peers for SCION services and triggers path discovery for those without scionState yet. Fixes the case where SetNetworkMap ran before SCION was available. 2. ReSTUN(scion-connected): triggers endpoint re-advertisement so peers receive our SCION address via Hostinfo update.
pconnSCION was declared in the "no locking required" section of Conn but was read and written from multiple goroutines without synchronization: receiveSCION, sendSCION, and sendSCIONBatch read it on the hot path without locks, while closeSCIONLocked, retrySCIONConnect, reconnectSCION, and ReconfigureSCION wrote it (some under c.mu, some without). This mixed-locking pattern is a data race detectable by the Go race detector, and can cause torn pointer reads on ARM (Android). Change pconnSCION from *scionConn to atomic.Pointer[scionConn], matching the RebindingUDPConn.pconnAtomic pattern used for pconn4/pconn6. All reads become .Load() (lock-free, safe on all architectures) and all writes become .Store() (can still be coordinated with c.mu for higher-level operations like close-then-reconnect sequences). SCIONStatus no longer needs c.mu since the atomic load is sufficient for reading the pointer and the immutable localIA field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: tjohn327 <tonyjanugrah@gmail.com>
Run retrySCIONConnect in a goroutine so ReconfigureSCION returns immediately. The bootstrap cascade can block 30-60s on network I/O; making it async prevents blocking the LocalAPI handler and avoids potential ANR on Android.
When connected but shimXPC is nil (new infrastructure, no dispatcher), poll every 30s instead of 5s. shimXPC is immutable per scionConn so frequent polling is wasteful — only a full reconnect creating a new scionConn could add a shim. Reduces wakeups from ~17K/day to ~2.9K/day in the common no-shim case.
* wgengine/magicsock: implement SCION fast-path checksum and header serialization - Added functions for computing the SCION pseudo-header checksum and finishing the checksum for SCION/UDP packets. - Introduced a pre-serialized header template for fast-path sends to optimize performance by bypassing standard serialization. - Enhanced the scionConn structure to support fast-path operations, including adjustments to the underlay connection handling. - Updated tests to validate the correctness of the new checksum computations and fast-path functionality. * wgengine/magicsock: implement SCION batch receive and parsing enhancements - Introduced scionRecvBatch for efficient batch processing of SCION packets, utilizing a sync.Pool for buffer reuse. - Added parseSCIONPacket function to extract source address and payload from raw SCION packets, improving packet handling. - Enhanced receiveSCION method to support batch reading from the underlay socket, optimizing performance during packet reception. - Updated logic for handling disco packets to leverage the new batch processing capabilities. * wgengine/magicsock: enhance SCION underlay support for IPv6 - Added support for IPv6 in the SCION connection handling, allowing for batch I/O operations with both IPv4 and IPv6. - Updated scionListenAddr to allow overriding the listen address via the TS_SCION_LISTEN_ADDR environment variable, supporting IPv6 localhost. - Refactored scionConn to use a common interface for underlay connections, improving flexibility for packet handling. - Enhanced documentation to clarify the behavior of the listen address and its default settings.
This reverts commit 6bb89ea.
Revert "Add batch read and write support for SCION"
Integrates SCION as an alternative transport path in Tailscale's magicsock, with full support for Android runtime configuration
- Add NOTICE with Tailscale and SCION attribution - Rename deb/rpm/tgz package to tailscale-scion - Add Conflicts/Replaces for official tailscale package - Fix license from MIT to BSD-3-Clause - Update maintainer, description, homepage for netsys-lab - Add netsys-lab copyright to SCION-specific source files
Triggered on tag push (v*-scion.*). Builds Linux deb/rpm/tgz (amd64+arm64), macOS tgz (amd64+arm64), Windows zip (amd64). Publishes all artifacts to GitHub Releases.
The dist tool runs on the host (amd64) and cross-compiles internally. Setting GOARCH in the env caused go run to build the dist tool itself for arm64, which can't execute on amd64.
Shallow clone (fetch-depth: 1) caused missing files during package glob. Full checkout ensures all files are available.
gocross was made opt-in in 2025-06-16 but dist.go still forced TS_USE_GOCROSS=1, causing 'no matching files' errors when gocross modified GOOS/GOARCH during go list. Set TS_USE_GOCROSS=0 so the Tailscale Go toolchain is used directly. Also restore deb/rpm targets in the release workflow.
In CI (CI=true), gocross-wrapper.sh enables set -x which writes bash traces to stderr. GoPkg() uses CombinedOutput() merging stdout+stderr, so the real path gets mixed with trace output. Add NOBASHDEBUG=true (existing upstream mechanism) to suppress the traces. Restore TS_USE_GOCROSS=1 to match upstream.
Add documentation for the SCION integration: - docs/architecture.md: component overview, connection flow, data flow, key design decisions including peerapi4 piggyback mechanism - README.md: replace upstream README with SCION-specific user guide, env var reference, build instructions Fix SCION service address format to use bracket notation for IPv6 compatibility. The peerapi4 piggyback format changes from "scion=ISD-AS,hostIP:port" to "scion=ISD-AS,[hostIP]:port" so that IPv6 addresses (which contain colons) don't break the port parser. Backward-compatible parsing for unbracketed format is preserved.
These workflows depend on Tailscale-specific infrastructure (self-hosted runners, Azure cigocacher, Slack, FlakeHub, private secrets). Keep only release.yml for our GitHub Releases.
tjohn327
added a commit
that referenced
this pull request
Apr 21, 2026
… v1.96.5 PR #5 on main was a squash merge of earlier scion-dev work, which broke git's view of scion-dev as an ancestor. This merge uses -X theirs so every scion-dev file wins over the squash commit's flattened version. Since scion-dev contains every commit originally squashed into PR #5 plus the new work (Phase 2 TRC verification, v0.15.0, v1.96.5), no information is lost.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.