Skip to content

Define /unix #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Define /unix #174

wants to merge 2 commits into from

Conversation

achingbrain
Copy link
Member

Summary

Adds a protocol note for how to encode paths to Unix domain sockets as strings, that may include the delimiting character of /.

This allows us to append other tuples to the multiaddr while also ensuring we can round-trip the address to a string and back.

This doesn't affect the binary representation of the multiaddr since everything is length-delimited.

Takes inspiration from #164 and proposes using URI encoding for the segment, the same as the /http-path component.

One difference is if the path is to represent the filesystem root, it must be included in the value portion of the tuple, otherwise it can be omitted.

Before Merge

Adds a protocol note for how to encode paths to Unix domain sockets
as strings, that may include the delimiting character of `/`.

This allows us to append other tuples to the multiaddr while also
ensuring we can round-trip the address to a string and back.

This doesn't affect the binary representation of the multiaddr since
everything is length-delimited.

Takes inspiration from #164 and proposes using URI encoding for the
segment, the same as the `/http-path` component.

One difference is if the path is to represent the filesystem root, it
must be included in the value portion of the tuple, otherwise it can
be omitted.
achingbrain added a commit to multiformats/js-multiaddr-matcher that referenced this pull request Nov 4, 2024
Need to resolve how to encode unix paths so peer ids can be appended
to them - multiformats/multiaddr#174
@MarcoPolo
Copy link
Contributor

Are there any backwards compatibility issues? Is anyone using this multiaddr currently?

@aschmahmann
Copy link

Is anyone using this multiaddr currently

I'm not sure how heavily it's utilized, but kubo can use those addresses. e.g. https://github.com/ipfs/kubo/blob/4009ad3e5a502518ddc7d48a888707f812ddc629/docs/config.md?plain=1#L219

@aschmahmann
Copy link

Are there any backwards compatibility issues?

I haven't looked too deeply at usage, but I'd suspect so. The question is what to do about it. There are a bunch of issues that are basically about the failures related to protocols (like http or unix) in taking unescaped paths like #139 #55. They were also known from the earliest days multiformats/go-multiaddr#31.

Maybe the answer is to just break it, or to say that implementations MAY/SHOULD/MUST preferentially try looking for the entire path as a socket and if that fails will try with just the first path segment as a socket.


Not specific to this PR, but between http-path and this perhaps worth biting the bullet on specifying how "path" types should be done in general. Maybe the answer is just to require them all to be escaped, maybe it's something else (e.g. closer to @Stebalien's proposal in multiformats/multiformats#55).

@achingbrain
Copy link
Member Author

There will be backwards compatibility issues with the string representation of the path due to the escaping, though not with the binary representation.

I think this is used but not very commonly. The js stack can also use unix addresses but currently only as a terminal element, that is no tuples can follow it.

Arguably the lack of escaping has harmed it's use - I know I've tried to use it a few times over the years but always come up hard against the inability to append anything else to the address and the various issues asking for clarification etc speak to a need for this.

or to say that implementations MAY/SHOULD/MUST preferentially try looking for the entire path as a socket and if that fails will try with just the first path segment as a socket

I wonder if this could get exploited by making longer paths with segments that mirror tuples after the unix section? Possibly with .. in them that go somewhere bad?

Maybe the answer is to just break it

I kind of think so. The original PR was merged in haste, the question about if we need to append things to it was unresolved.

It was flagged as experimental and subject to change so..

proposal in multiformats/multiformats#55

It's interesting that - it seems to mandate parsing paths left to right, we recently switched multiaddr-to-uri to parsing right to left to support some forms of multiaddr.

It also doesn't seem to say much about escaping so we'd probably still need to solve the same problem there.

@achingbrain
Copy link
Member Author

Are there any further thoughts here?

@ntninja
Copy link
Contributor

ntninja commented Mar 14, 2025

@achingbrain: Since I just noticed it wasn’t mentioned I'd just like to mention my old MultiAddr proposal from 2019 again: #87

It’s mainly about argument to individual MultiAddr path segments, but it also proposes syntax for encoding paths in MultiAddr segments:

  • /unix/(/dir/file.socket)/http, rather than /unix/dir%2Ffile.socket/http
  • /unix/(/dir\\weird/\(file\).sock)/ws, rather than /unix/dir\weird%2F(file).sock/ws
  • /unix/dir/file.socket, rather than /unix/dir%2Ffile.socket for backward-compatibility

This is IMHO significantly easier to read than using percent-encoded local file system paths.

@achingbrain
Copy link
Member Author

Thanks, I didn't see that issue. I like your proposal for using () for tuple values and the flexibility it brings but I think it's more in the realm of a Multiaddr v2 as you point out. I would certainly support that style of stringification if it those sort of discussions started happening.

To solve this particular issue though, I would prefer to be consistent with other multiaddr tuple types. HTTP has already solved this problem by percent escaping the characters so I think we should use that here too.

@ntninja
Copy link
Contributor

ntninja commented Mar 22, 2025

@achingbrain: I can see where you’re coming from, but I’d just like the point out that the major benefit, besides nicer style, is the fact that it’s backwards-compatible! (We are talking about MultiAddr v1 here after all. 🙂)

I know most of the MultiFormat people tend to lean on the string-representation => unimportant side, but at least ipfs-cluster and py-ipfs-http-client are using the /unix/dir/file.socket string-representation as part of their (potentially) persistent configuration right now. So changing it to /unix/dir%2Ffile.socket would actually be a backward-incompatible/major-release change for them.

By contrast, /unix/(/dir/file.socket)/… would remain compatible except in the super-obscure edge-case of a socket file path actually starting with /(/

Not saying you shouldn’t go with your proposal, but that is something to thoroughly consider I think!

@dhuseby
Copy link

dhuseby commented May 7, 2025

Are there any limitations on the character set that can be used in a path? I'm assuming that the characters are all UTF-8 encoded so unicode characters like ↑ (e.g. up arrow, UTF-8: 0xE2 0x86 0x91) are encoded as %25E28691. Is that correct? Or would it be encoded as %E28691?

@achingbrain
Copy link
Member Author

Are there any limitations on the character set that can be used in a path?

Not really, it's kind of the wild west since most "unix" implementations can trace their lineage prior to UTF-8 and other modern conveniences.

POSIX has a definition of portable file name characters but realistically it could be UTF-8, ASCII or char[]. I think just not NUL \0

As per this proposal would be encoded as %E2%86%91.

Full details are in RFC 3986, linked to from the proposed changes in the PR.

@achingbrain
Copy link
Member Author

Refs ipshipyard/roadmaps#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants