Skip to content

doc: endhost API design doc#4884

Open
JordiSubira wants to merge 4 commits into
scionproto:masterfrom
JordiSubira:endhost_api_design
Open

doc: endhost API design doc#4884
JordiSubira wants to merge 4 commits into
scionproto:masterfrom
JordiSubira:endhost_api_design

Conversation

@JordiSubira
Copy link
Copy Markdown
Contributor

This PR adds the endhost API design document. It was originally introduced and the content is mostly based on the discussion in #4834. Some aspects are still open to discuss/improve:

[] How should we argue that MTU and Core information, previously returns on AS(·) method are not necessary anymore? In other words, I do not think any application/endhost functionality will miss them, but should we argue what is the case?
[] Following the trend on the discussion issue, I have described pages as self-contained, mostly to favor average clients. Nonetheless, we should have a consensus on this. We can of course, evolve the document after trying things out with the beta implementations we are doing (or going to do).

Copy link
Copy Markdown
Contributor

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % some nitpicks

Comment thread doc/dev/design/endhost-api.rst
Comment thread doc/dev/design/endhost-api.rst Outdated
Co-authored-by: Tilmann <tilmann_dev@gmx.de>
@tzaeschke
Copy link
Copy Markdown
Contributor

tzaeschke commented Feb 17, 2026

(Copied from #4834)

Should we specify error codes?
For example, the current scion-sdk appears to produce the following error codes for a HTTP/1.1 ListSegmentsRequest:

ISD-AS = 0 -> error 400: BAD_REQUEST
ISD-AS = unregistered-AS -> error 500: INTERNAL_SERVER_ERROR

I think it could be useful if errors were standardized so clients know how to deal with them.

It may also okay to simply return an empty set instead of throwing an error for unknown-ASes above.
And what about the rest of the API?

@JordiSubira
Copy link
Copy Markdown
Contributor Author

JordiSubira commented Feb 18, 2026

Should we specify error codes?
And what about the rest of the API?

We can specify those errors or response codes for which we have a clear idea at the moment.

For example, the current scion-sdk appears to produce the following error codes for a HTTP/1.1

I guess you mean that Anapaya's beta implmentation of the Endpoint API server returns this code, right? At the end the value of this exercise (creating the document, dicussing) is also to reach an initial consensus about these details. My assumption is that Anapaya's implementation is already very close to the outcome of this PR, i.e., the "official" specification, but some smalls things may change.

It may also okay to simply return an empty set instead of throwing an error for unknown-ASes above.

Yes, I also think an empty set should suffice.

@tzaeschke
Copy link
Copy Markdown
Contributor

Another question: it seems the mtu of the local AS is currently not in the API. Can we add that?
I am using that, for example, when creating SCION path objects for communication within the local AS.
If we don't have it then we have the weird situation that we have MTU information only for remote destinations, but not for local destinations.


The following daemon RPCs are obsoleted by the endhost API:

- ``AS`` -- The local ISD-AS identifier, core flag, and MTU is omitted from the endhost API.
Copy link
Copy Markdown
Contributor

@tzaeschke tzaeschke Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of having MTU information in the API. I think it is useful, for example, when communicating with endhosts in the local AS.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the ISD/AS, I understand this is actually available from querying the underlay without ISD/AS code in the request, which will return Router information with ISD/AS attached.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both UDP/IP underlay and SNAP return this information.

Copy link
Copy Markdown
Contributor

@tzaeschke tzaeschke Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, I would still find it useful to get the ISD/AS information explicitly. Imagine a local test AS without border routers, or a situation where all border routers are disabled. If that happens, having no access to the ISD/AS info precludes us from communicating with other endhosts in the same AS (they may reasonably expect the correct ISD/AS in the SCION header, but the client has no way of getting it).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am not sure why the ISD/AS number is given per BR. Has that to do with Secure Access Groups?
This is a bit off-topic now, but for the Private ISD concept, it would be useful to have multiple ISD/AS per BR (one per interface). However, it is not necessary to provide this information in the API, the endhost can get the ISD/AS, respectively interface and border router, from the segment information.

So I am in favor of having the ISD/AS list preferrably in the ListUnderlaysResponse or the UdpUnderlay, so it is available even if no router is available.

// the requester's own ISD-AS.
uint64 src_isd_as = 1;

// The destination ISD-AS the final end-to-end path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to specify the behavior in case of ISD=0 or ISD=unknown. Do we get an empty set or an error? Which error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refer to ConnectRPC codes (https://connectrpc.com/docs/protocol#error-codes). For requests that are going to be always invalid like "ISD=0" I would use invalid_argument == 400. For queries that are correct but do not match anything, I would return an empty set only (i.e., no error).

uint32 dispatched_port_end = 2;
}

// The dispatched port range supported by the router. UDP/SCION packets with a destination
Copy link
Copy Markdown
Contributor

@tzaeschke tzaeschke Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing the name to something like "direct_ports", "non_forwarded_ports" or "dynamically_assigned_ports". "Dispatch" just means to "send", I think this name is based an history and is not very descriptive.

I would suggest avoiding the term "dispatched port", it is difficult to understand, especially without knowing historical context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's always some complication changing the name of a field/term. This field was already present in the topology.json file. The topolgy.json file will get removed from endhosts with the new Endhost API, but infrastracture nodes, such as the BRs may continue using it. Thus, changing the field name has some "consistency" implications.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it may. I think the bigger problem is that we either keep the dispatched_ports feature or remove it.
If we keep it, we need to specify it, possibly in an RFC, and my guess it that reviewers won't like the name because it is not very descriptive.
If we want to remove it, we have to change the semantics (currently non-present means everything goes to 30041). If we change the semantics, the least we should do is use a different name (see my comment further down).

So either way, I think we have to rename it. I'd argue better rename it now than later.

// The dispatched port range supported by the router. UDP/SCION packets with a destination
// port within this range will be sent to the same UDP destination port on the destination host.
// For other ports, the router will use the well-known dispatcher port (30041). If not set,
// the entire port range is assumed to be dispatched.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unclear means for "the entire port range to be dispatched".
I assume "entire range" means "dispatched_ports == NONE"?

I would actually suggest a semantic change that should allow us to phase out this property. How about we change the definition to "Not present -> dispatched_ports == ALL"?

With the topology file API, we cannot do that because old client may require dispatched ports and may not know about the setting. But with the new endhost API, we can assume that every client that uses it understands that not-present means ALL.
We can then gradually phase out old client and eventually, once everybody uses the new API, we can deprecate this field and stop supporting it.

This semantic difference would be communicated clearer if we gave the field a different name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as the comment above, this may introduce a consistency problem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as the comment above, this may introduce a consistency problem.

Not sure how this is inconsistent. Legacy applications would anyway need either a mapping layer or keep accessing using the old control service API. These would still use the empty range as default range.

ListChains
""""""""""

RPC path: ``/scion.endhost.v1.TrustService/ListChains``
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design document is not specifying the return type of this RPC endpoint. My suggestion would be to use something like this:

message ListChainResponse {
    repeated Chains list_chain = 1;
}

message Chains {
    // List of chains that match the subject.
    repeated Chain chains = 1;
    // The corresponding subject.
    Subject subject = 2;
}

message Chain {
    // AS certificate in the chain.
    bytes as_cert = 1;
    // CA certificate in the chain.
    bytes ca_cert = 2;
}

Copy link
Copy Markdown
Contributor

@shitz shitz Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the original proposal (#4834) has this:

message ListChainsResponse {
  // List of chains that match the request.
  repeated Chain chains = 1;
}

message Chain {
  // The subject corresponding to this chain.
  Subject subject = 1;

  // AS certificate in the chain.
  bytes as_cert = 2;

  // CA certificate in the chain.
  bytes ca_cert = 3;
}

I would keep the flattened structure and not introduce a list of chains per subject.

Furthermore, I think it doesn't hurt if we also add pagination support, i.e.,

// Request to fetch certificate chains for a given AS certificate subject.
message ListChainsRequest {
  // The subjects for which the chains should be returned.
  repeated Subject subjects = 1;

  // Point in time at which the AS certificate must still be valid. In seconds
  // since UNIX epoch.
  uint32 at_least_valid_until = 2;

  // Point in time at which the AS certificate must be or must have been
  // valid. In seconds since UNIX epoch.
  uint32 at_least_valid_since = 3;

  // The maximum total number of chains to return.
  // The service may return fewer than this value.
  // If unspecified, the maximum number of segments
  // per page is returned, which is 64.
  int32 page_size = 4;

  // A page token, received from a previous
  // ListChainsRequest call.
  // Provide this to retrieve the subsequent page.
  //
  // When paginating, all other parameters provided in
  // ListChainsRequest must match the call that provided
  // the page token.
  string page_token = 5;
}

and the reponse

message ListChainsResponse {
  // List of chains that match the request.
  repeated Chain chains = 1;
  // The token for the next page of results.
  string next_page_token = 2;
}

Comment on lines +188 to +193
The ``PathService`` provides paginated access to path segments (up, core, down)
that the client can combine to construct end-to-end paths. Returning segments
instead of pre-built paths enables client-side verification against the CP-PKI
and composability with segments from other sources.

RPC path: ``/scion.endhost.v1.PathService/ListSegments``
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change: Rename PathService to SegmentsService, since it returns segments and not paths.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually returns path segments, so to be precise it would be PathSegmentService?
But I don't think it's that bad to have it shortened to PathService. I think either is ok.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this has already been implemented.
Nitpick: SegmentsService uses plural, UnderlayService uses singular. It should probably be SegmentSevice (or PathSegmentService).

@tzaeschke
Copy link
Copy Markdown
Contributor

@shitz @d-s-d @lukedirtwalker We currently have @rohrerj working on an implementation for the scionproto reference implementation. It is close to being finished, so maybe we could push a bit on getting this design finalized?

I'm not sure whether you have more changes from your side?

// When paginating, all other parameters provided in
// ListSegmentsRequest must match the call that provided
// the page token.
string page_token = 4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to specify what the expected response should be once the last page has been queried. I would suggest to return an empty string as page_token in that case, since querying using empty string would return the first page, and would therefore "close" the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants