Skip to content

Merge scion-dev: SCION transport, Android integration APIs, CI/CD#5

Merged
tjohn327 merged 81 commits into
mainfrom
scion-dev
Mar 17, 2026
Merged

Merge scion-dev: SCION transport, Android integration APIs, CI/CD#5
tjohn327 merged 81 commits into
mainfrom
scion-dev

Conversation

@tjohn327
Copy link
Copy Markdown
Collaborator

  • SCION path-aware networking in magicsock (embedded daemon, bootstrap, path selection)
  • SCIONPathInfo in PeerStatus, ReconfigureSCION/SCIONStatus APIs
  • scion-status LocalAPI endpoint
  • Cross-platform release workflow (Linux deb/rpm/tgz, macOS, Windows)
  • Package renamed to tailscale-scion with Conflicts/Replaces
  • NOTICE file and BSD-3-Clause license correction

tjohn327 and others added 30 commits March 9, 2026 09:15
…and improve path registration locking

- Added support for piggybacking SCION service information in the peerapi4 Description field.
- Updated path registration and lookup methods to ensure thread safety with locking.
- Enhanced tests to validate new SCION service extraction logic and path registration behavior.
- Enhanced logic for determining when to send full pings and disco pings based on SCION path characteristics.
- Updated MTU probing logic to account for SCION paths, ensuring proper handling of payload sizes.
- Refined address quality evaluation in tests to reflect the new preference for SCION over direct UDP connections.
- Improved logging to include MTU information when discovering SCION paths.
- Implemented throttled re-discovery for SCION paths to improve responsiveness when paths expire.
- Added cleanup of old SCION path entries outside of critical sections to prevent deadlocks.
- Introduced a constant for assumed per-hop latency when SCION reports LatencyUnset, improving path latency calculations.
- Updated metrics tracking for SCION disco messages to better reflect usage patterns.
…et buffer sizes

- Replaced the use of Listen with OpenRaw to allow setting custom UDP socket buffer sizes.
- Increased the read and write buffer sizes to 7 MB to prevent packet drops at high throughput.
- Wrapped the raw connection with NewCookedConn for enhanced SCION connection management.
…ng methods

- Added logic to ensure SCION paths are pinged during heartbeat even when a low-latency direct path is preferred.
- Updated discoPing method to include SCION pings for peers when available, improving path competition and responsiveness.
…logic

- Introduced mechanisms to detect dead SCION sockets and trigger reconnections based on packet reception time.
- Added constants for read deadlines and reconnection thresholds to enhance socket reliability.
- Enhanced the receiveSCION function to handle read timeouts and errors gracefully without propagating them to WireGuard.
- Implemented path re-discovery for active SCION peers upon reconnection to ensure updated routing.
…on addresses

- Introduced a cached destination address in scionPathInfo to optimize path resolution.
- Updated writeTo and sendSCIONBatch methods to utilize cached destination for improved performance.
- Refactored lastSCIONRecv to use monotonic time for better performance
- Ensured buildCachedDst is called during path updates to maintain cache consistency.
…ialization

- Added functions for computing the SCION pseudo-header checksum and finishing the checksum for SCION/UDP packets.
- Introduced a pre-serialized header template for fast-path sends to optimize performance by bypassing standard serialization.
- Enhanced the scionConn structure to support fast-path operations, including adjustments to the underlay connection handling.
- Updated tests to validate the correctness of the new checksum computations and fast-path functionality.
tailscale#16450)

Adds logic for containerboot to signal that it can't auth, so the
operator can reissue a new auth key. This only applies when running with
a config file and with a kube state store.

If the operator sees reissue_authkey in a state Secret, it will create a
new auth key iff the config has no auth key or its auth key matches the
value of reissue_authkey from the state Secret. This is to ensure we
don't reissue auth keys in a tight loop if the proxy is slow to start or
failing for some other reason. The reissue logic also uses a burstable
rate limiter to ensure there's no way a terminally misconfigured
or buggy operator can automatically generate new auth keys in a tight loop.

Additional implementation details (ChaosInTheCRD):

- Added `ipn.NotifyInitialHealthState` to ipn watcher, to ensure that
  `n.Health` is populated when notify's are returned.
- on auth failure, containerboot:
  - Disconnects from control server
  - Sets reissue_authkey marker in state Secret with the failing key
  - Polls config file for new auth key (10 minute timeout)
  - Restarts after receiving new key to apply it

- modified operator's reissue logic slightly:
  - Deletes old device from tailnet before creating new key
  - Rate limiting: 1 key per 30s with initial burst equal to replica count
  - In-flight tracking (authKeyReissuing map) prevents duplicate API calls
    across reconcile loops

Updates tailscale#14080

Change-Id: I6982f8e741932a6891f2f48a2936f7f6a455317f


(cherry picked from commit 969927c)

Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: chaosinthecrd <tom@tmlabs.co.uk>
Fix three independent flake sources, at least as debugged by Claude,
though empirically no longer flaking as it was before:

1. Poll for connection counter data instead of reading immediately.
   The conncount callback fires asynchronously on received WireGuard
   traffic, so after counts.Reset() there is no guarantee the counter
   has been repopulated before checkStats reads it. Use tstest.WaitFor
   with a 5s timeout to retry until a matching connection appears.

2. Replace the *2 symmetry assumption in global metric assertions.
   metricSendUDP and friends are AggregateCounters that sum per-conn
   expvars from both magicsock instances. The old assertion assumed
   both instances had identical packet counts, which breaks under
   asymmetric background WireGuard activity (handshake retries, etc).
   The new assertGlobalMetricsMatchPerConn computes the actual sum of
   both conns' expvars and compares against the AggregateCounter value.

3. Tolerate physical stats being 0 when user metrics are non-zero.
   A rebind event replaces the socket mid-measurement, resetting the
   physical connection counter while user metrics still reflect packets
   processed before the rebind. Log instead of failing in this case.
   Also move counts.Reset() after metric reads and reorder the reset
   sequence (counts before metrics) to minimize the race window.

Fixes tailscale#13420

Change-Id: I7b090a4dc229a862c1a52161b3f2547ec1d1f23f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
ReadFromUDPAddrPort worked if UDP GRO was unsupported, but we don't
actually want attempted usage, nor does any exist today. Future work
on tailscale/corp#37679 would have required more complexity in this
method, vs clarifying the API intents.

Updates tailscale/corp#37679

Signed-off-by: Jordan Whited <jordan@tailscale.com>
…nge (tailscale#18974)

In TestUserspaceEnginePortReconfig, when selecting a port, use a random offset rather than searching in a continguous range in case there is a range that is blocked

Updates tailscale#2855

Signed-off-by: kari-ts <kari@tailscale.com>
After switching from cellular to wifi without ipv6, ForeachInterface still sees rmnet prefixes, so HaveV6 stays true, and magicsock keeps attempting ipv6 connections that either route through cellular or time out for users on wifi without ipv6

This:
-Adds SetAndroidBindToNetworkFunc, a callback to bind the socket to the selected Android Network object

Updates tailscale#6152

Signed-off-by: kari-ts <kari@tailscale.com>
Add two small APIs to support out-of-tree projects to exchange custom
signaling messages over DERP without requiring disco protocol
extensions:

- OnDERPRecv callback on magicsock.Options / wgengine.Config: called for
  every non-disco DERP packet before the peer map lookup, allowing callers
  to intercept packets from unknown peers that would otherwise be dropped.

- SendDERPPacketTo method on magicsock.Conn: sends arbitrary bytes to a
  node key via a DERP region, creating the connection if needed. Thin
  wrapper around the existing internal sendAddr.

Also allow netstack.Start to accept a nil LocalBackend for use cases
that wire up TCP/UDP handlers directly without a full LocalBackend.

Updates tailscale/corp#24454

Change-Id: I99a523ef281625b8c0024a963f5f5bf5d8792c17
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6.0.0 to 7.0.0.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@b7c566a...bbbca2d)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: 7.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7.0.0 to 8.0.0.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@37930b1...70fc10c)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: 8.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.5 to 4.32.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@c793b71...0d579ff)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.32.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
…ments

- Introduced scionRecvBatch for efficient batch processing of SCION packets, utilizing a sync.Pool for buffer reuse.
- Added parseSCIONPacket function to extract source address and payload from raw SCION packets, improving packet handling.
- Enhanced receiveSCION method to support batch reading from the underlay socket, optimizing performance during packet reception.
- Updated logic for handling disco packets to leverage the new batch processing capabilities.
tjohn327 and others added 28 commits March 14, 2026 22:50
…sponding tests

- Enhanced the addNewSCIONPathsForPeer method to initialize scionState for endpoints when initial path discovery fails.
- Implemented logic to register new SCION paths and ensure proper recovery of scionState, allowing for effective disco probing.
- Added a new test, TestScionAddNewPathsRecovery, to verify the correct initialization of scionState and path management during recovery scenarios.
- Improved overall robustness of SCION path handling in the presence of failed initial discoveries.
- Replaced direct use of t.Setenv with envknob.Setenv for setting the TS_SCION_PORT environment variable in tests.
- Added cleanup logic to reset the environment variable after each test, ensuring isolation between test cases.
Add SCIONPathInfo struct to ipnstate with path description, active
status, health, latency, expiry, and MTU fields. Populate it from
endpoint's scionState in populatePeerStatus via a new build-tagged
helper method populateSCIONPathsLocked.
…onfig

Add SCIONConfig struct and two methods on *Conn:
- ReconfigureSCION: updates envknobs and triggers reconnection
- SCIONStatus: returns whether SCION is connected and local IA
GET /localapi/v0/scion-status returns SCION connection status
and local ISD-AS number. Stub omit file for ts_omit_scion builds.
Close existing SCION connection and set TS_SCION_FORCE_BOOTSTRAP
before retrying, so that config changes from the Android UI
always trigger a real reconnection attempt.
When SCION connects mid-session (e.g. via ReconfigureSCION from the
Android UI), the receive goroutines were never started because
receiveFuncs() only included SCION functions if pconnSCION was non-nil
at Open() time.

Fix: always register receiveSCION and receiveSCIONShim in the receive
func list. When pconnSCION is nil, they poll every 5 seconds instead
of blocking forever on donec. Once SCION connects, they pick up the
new connection and start processing packets.
closeSCIONLocked was closing the socket but leaving pconnSCION
pointing at the closed conn. This caused panics when toggling
SCION off then on, as retrySCIONConnect saw a non-nil (but closed)
connection and returned early, or receive goroutines tried to
read from the closed socket.
populateSCIONPathsLocked was returning stale path data from
scionState even after SCION was disabled and pconnSCION set to nil.
Check pconnSCION first and return empty paths when disconnected.
…nnect

After retrySCIONConnect succeeds:
1. discoverNewSCIONPeers: scans all peers for SCION services and
   triggers path discovery for those without scionState yet. Fixes
   the case where SetNetworkMap ran before SCION was available.
2. ReSTUN(scion-connected): triggers endpoint re-advertisement so
   peers receive our SCION address via Hostinfo update.
pconnSCION was declared in the "no locking required" section of Conn
but was read and written from multiple goroutines without
synchronization: receiveSCION, sendSCION, and sendSCIONBatch read it
on the hot path without locks, while closeSCIONLocked, retrySCIONConnect,
reconnectSCION, and ReconfigureSCION wrote it (some under c.mu, some
without). This mixed-locking pattern is a data race detectable by the
Go race detector, and can cause torn pointer reads on ARM (Android).

Change pconnSCION from *scionConn to atomic.Pointer[scionConn],
matching the RebindingUDPConn.pconnAtomic pattern used for pconn4/pconn6.
All reads become .Load() (lock-free, safe on all architectures) and
all writes become .Store() (can still be coordinated with c.mu for
higher-level operations like close-then-reconnect sequences).

SCIONStatus no longer needs c.mu since the atomic load is sufficient
for reading the pointer and the immutable localIA field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: tjohn327 <tonyjanugrah@gmail.com>
Run retrySCIONConnect in a goroutine so ReconfigureSCION returns
immediately. The bootstrap cascade can block 30-60s on network I/O;
making it async prevents blocking the LocalAPI handler and avoids
potential ANR on Android.
When connected but shimXPC is nil (new infrastructure, no dispatcher),
poll every 30s instead of 5s. shimXPC is immutable per scionConn so
frequent polling is wasteful — only a full reconnect creating a new
scionConn could add a shim. Reduces wakeups from ~17K/day to ~2.9K/day
in the common no-shim case.
* wgengine/magicsock: implement SCION fast-path checksum and header serialization

- Added functions for computing the SCION pseudo-header checksum and finishing the checksum for SCION/UDP packets.
- Introduced a pre-serialized header template for fast-path sends to optimize performance by bypassing standard serialization.
- Enhanced the scionConn structure to support fast-path operations, including adjustments to the underlay connection handling.
- Updated tests to validate the correctness of the new checksum computations and fast-path functionality.

* wgengine/magicsock: implement SCION batch receive and parsing enhancements

- Introduced scionRecvBatch for efficient batch processing of SCION packets, utilizing a sync.Pool for buffer reuse.
- Added parseSCIONPacket function to extract source address and payload from raw SCION packets, improving packet handling.
- Enhanced receiveSCION method to support batch reading from the underlay socket, optimizing performance during packet reception.
- Updated logic for handling disco packets to leverage the new batch processing capabilities.

* wgengine/magicsock: enhance SCION underlay support for IPv6

- Added support for IPv6 in the SCION connection handling, allowing for batch I/O operations with both IPv4 and IPv6.
- Updated scionListenAddr to allow overriding the listen address via the TS_SCION_LISTEN_ADDR environment variable, supporting IPv6 localhost.
- Refactored scionConn to use a common interface for underlay connections, improving flexibility for packet handling.
- Enhanced documentation to clarify the behavior of the listen address and its default settings.
Revert "Add batch read and write support for SCION"
Integrates SCION as an alternative transport path in Tailscale's magicsock, with full support for Android runtime configuration
- Add NOTICE with Tailscale and SCION attribution
- Rename deb/rpm/tgz package to tailscale-scion
- Add Conflicts/Replaces for official tailscale package
- Fix license from MIT to BSD-3-Clause
- Update maintainer, description, homepage for netsys-lab
- Add netsys-lab copyright to SCION-specific source files
Triggered on tag push (v*-scion.*). Builds Linux deb/rpm/tgz
(amd64+arm64), macOS tgz (amd64+arm64), Windows zip (amd64).
Publishes all artifacts to GitHub Releases.
The dist tool runs on the host (amd64) and cross-compiles
internally. Setting GOARCH in the env caused go run to build
the dist tool itself for arm64, which can't execute on amd64.
Shallow clone (fetch-depth: 1) caused missing files during
package glob. Full checkout ensures all files are available.
gocross was made opt-in in 2025-06-16 but dist.go still forced
TS_USE_GOCROSS=1, causing 'no matching files' errors when gocross
modified GOOS/GOARCH during go list. Set TS_USE_GOCROSS=0 so the
Tailscale Go toolchain is used directly.

Also restore deb/rpm targets in the release workflow.
In CI (CI=true), gocross-wrapper.sh enables set -x which writes
bash traces to stderr. GoPkg() uses CombinedOutput() merging
stdout+stderr, so the real path gets mixed with trace output.
Add NOBASHDEBUG=true (existing upstream mechanism) to suppress
the traces. Restore TS_USE_GOCROSS=1 to match upstream.
Add documentation for the SCION integration:
- docs/architecture.md: component overview, connection flow, data flow,
  key design decisions including peerapi4 piggyback mechanism
- README.md: replace upstream README with SCION-specific user guide,
  env var reference, build instructions

Fix SCION service address format to use bracket notation for IPv6
compatibility. The peerapi4 piggyback format changes from
"scion=ISD-AS,hostIP:port" to "scion=ISD-AS,[hostIP]:port" so that
IPv6 addresses (which contain colons) don't break the port parser.
Backward-compatible parsing for unbracketed format is preserved.
These workflows depend on Tailscale-specific infrastructure (self-hosted
runners, Azure cigocacher, Slack, FlakeHub, private secrets). Keep only
release.yml for our GitHub Releases.
@tjohn327 tjohn327 merged commit 19016b4 into main Mar 17, 2026
tjohn327 added a commit that referenced this pull request Apr 21, 2026
… v1.96.5

PR #5 on main was a squash merge of earlier scion-dev work, which broke
git's view of scion-dev as an ancestor. This merge uses -X theirs so
every scion-dev file wins over the squash commit's flattened version.
Since scion-dev contains every commit originally squashed into PR #5
plus the new work (Phase 2 TRC verification, v0.15.0, v1.96.5), no
information is lost.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants