Skip to content

Conversation

lla-dane
Copy link
Contributor

Tracks multiformats/multiaddr#181

This is a WIP PR, which aims to add the remaining protocols in py-libp2p in reference with go-libp2p as mentioned in the above issue.

Protocols added in this PR:

  • http-path

More protocols to be added in the same in future commits.

Comment on lines 7 to 13
IS_PATH = True
SIZE = -1 # LengthPrefixedVarSize


class Codec(CodecBase):
SIZE = SIZE
IS_PATH = IS_PATH
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @acul71 @seetadev : Can you please confirm if these Codec configs are correct for http-path

@lla-dane
Copy link
Contributor Author

Hey @seetadev @acul71: Please take a look and see if its correctly done. Also please provide some pointers on how to write test with multiaddr + http-path.

@acul71
Copy link
Contributor

acul71 commented Sep 18, 2025

@lla-dane

Analysis of http-path Codec Configuration

Question Summary

@lla-dane asked for confirmation on the correctness of the http-path codec configuration:

IS_PATH = True
SIZE = -1  # LengthPrefixedVarSize

class Codec(CodecBase):
    SIZE = SIZE
    IS_PATH = IS_PATH

And requested guidance on writing tests for multiaddr + http-path.

Analysis Results

Codec Configuration: ✅ CORRECT

The codec configuration in multiaddr/codecs/http_path.py is correct and matches the Go implementation:

  • SIZE = -1 correctly corresponds to LengthPrefixedVarSize in Go
  • IS_PATH = True is appropriate for path-like protocols

Go Implementation Reference

From /home/luca/PNL_Launchpad_Curriculum/Libp2p/go-multiaddr/protocols.go:

protoHTTPPath = Protocol{
    Name:       "http-path",
    Code:       P_HTTP_PATH,
    VCode:      CodeToVarint(P_HTTP_PATH),
    Size:       LengthPrefixedVarSize,  // This is -1
    Transcoder: TranscoderHTTPPath,
}

The Go transcoder implementation (transcoders.go lines 466-489):

var TranscoderHTTPPath = NewTranscoderFromFunctions(httpPathStB, httpPathBtS, validateHTTPPath)

func httpPathStB(s string) ([]byte, error) {
    unescaped, err := url.QueryUnescape(s)
    if err != nil {
        return nil, err
    }
    if len(unescaped) == 0 {
        return nil, fmt.Errorf("empty http path is not allowed")
    }
    return []byte(unescaped), err
}

func httpPathBtS(b []byte) (string, error) {
    if len(b) == 0 {
        return "", fmt.Errorf("empty http path is not allowed")
    }
    return url.QueryEscape(string(b)), nil
}

func validateHTTPPath(b []byte) error {
    if len(b) == 0 {
        return fmt.Errorf("empty http path is not allowed")
    }
    return nil
}

Issue Found: Protocol Registration

PROBLEM: The protocol is incorrectly registered in protocols.py line 161:

Protocol(P_HTTP_PATH, "http-path", None),  # ❌ Should be "http_path"

SOLUTION: Should be:

Protocol(P_HTTP_PATH, "http-path", "http_path"),

This means the protocol currently doesn't use the http_path codec, which explains why the codec configuration might seem unused.

Testing Guidance for multiaddr + http-path

1. Basic Roundtrip Tests

def test_http_path_multiaddr_roundtrip():
    """Test basic http-path in multiaddr string roundtrip"""
    test_cases = [
        "/http-path/foo",
        "/http-path/foo/bar",
        "/http-path/api/v1/users",
    ]
    
    for addr_str in test_cases:
        m = Multiaddr(addr_str)
        assert str(m) == addr_str
        # Verify protocol value extraction
        path_value = m.value_for_protocol(P_HTTP_PATH)
        expected_path = addr_str.replace("/http-path", "")
        assert path_value == expected_path

2. URL Encoding Tests

def test_http_path_url_encoding():
    """Test special characters and URL encoding behavior"""
    test_cases = [
        ("/foo bar", "/foo%20bar"),
        ("/path/with/special!@#", "/path/with/special%21%40%23"),
        ("/こんにちは", "/%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF"),
        ("/tmp/bar", "/tmp%2Fbar"),  # Forward slash encoding
    ]
    
    for input_path, expected_encoded in test_cases:
        addr_str = f"/http-path{input_path}"
        m = Multiaddr(addr_str)
        # The string representation should show URL-encoded path
        assert str(m) == f"/http-path{expected_encoded}"

3. Complex Multiaddr Tests

def test_http_path_in_complex_multiaddr():
    """Test http-path as part of larger multiaddr chains"""
    test_cases = [
        "/ip4/127.0.0.1/tcp/443/tls/http/http-path/api/v1",
        "/ip4/127.0.0.1/tcp/80/http/http-path/static/css",
        "/dns/example.com/tcp/443/tls/http/http-path/docs",
    ]
    
    for addr_str in test_cases:
        m = Multiaddr(addr_str)
        assert str(m) == addr_str
        
        # Verify we can extract the http-path value
        path_value = m.value_for_protocol(P_HTTP_PATH)
        assert path_value.startswith("/")

4. Error Handling Tests

def test_http_path_error_cases():
    """Test error handling for invalid http-path values"""
    
    # Empty path should raise error
    with pytest.raises(StringParseError):
        Multiaddr("/http-path/")
    
    # Missing path value should raise error  
    with pytest.raises(StringParseError):
        Multiaddr("/http-path")
    
    # Invalid URL encoding should raise error
    with pytest.raises(StringParseError):
        Multiaddr("/http-path/invalid%zz")

5. Protocol Value Extraction Tests

def test_http_path_value_extraction():
    """Test extracting http-path values from multiaddr"""
    test_cases = [
        ("/http-path/foo", "foo"),
        ("/http-path/foo/bar", "foo/bar"),
        ("/http-path/api/v1/users", "api/v1/users"),
        ("/ip4/127.0.0.1/tcp/80/http/http-path/docs", "docs"),
    ]
    
    for addr_str, expected_path in test_cases:
        m = Multiaddr(addr_str)
        path_value = m.value_for_protocol(P_HTTP_PATH)
        assert path_value == expected_path

6. Binary Roundtrip Tests

def test_http_path_binary_roundtrip():
    """Test binary encoding/decoding roundtrip"""
    test_cases = [
        "/http-path/foo",
        "/http-path/foo/bar",
        "/http-path/api/v1/users",
    ]
    
    for addr_str in test_cases:
        m1 = Multiaddr(addr_str)
        binary = m1.to_bytes()
        m2 = Multiaddr.from_bytes(binary)
        assert str(m1) == str(m2)
        assert m1 == m2

7. Edge Cases and Special Characters

def test_http_path_edge_cases():
    """Test edge cases and special character handling"""
    
    # Test with various special characters
    special_paths = [
        "/path with spaces",
        "/path/with/multiple/slashes",
        "/path/with/unicode/测试",
        "/path/with/symbols!@#$%^&*()",
    ]
    
    for path in special_paths:
        addr_str = f"/http-path{path}"
        m = Multiaddr(addr_str)
        # Should handle encoding properly
        assert m.value_for_protocol(P_HTTP_PATH) == path

Implementation Notes

Current Test Coverage

The existing tests in test_protocols.py (lines 274-318) cover:

  • ✅ Basic codec roundtrip functionality
  • ✅ Empty string/bytes error handling
  • ✅ Special character encoding
  • ✅ Validation function

Missing Test Coverage

After fixing the protocol registration, add tests for:

  • ❌ Multiaddr string parsing with http-path
  • ❌ Protocol value extraction
  • ❌ Complex multiaddr chains with http-path
  • ❌ Binary encoding/decoding roundtrip
  • ❌ Error cases in multiaddr context

Recommendations

  1. Fix Protocol Registration: Update protocols.py line 161 to use "http_path" codec
  2. Add Multiaddr Tests: Implement the test patterns above once protocol registration is fixed
  3. Verify Path Behavior: Confirm whether IS_PATH = True is correct by testing path consumption behavior
  4. Cross-Reference with Go: Ensure Python behavior matches Go implementation for edge cases

Conclusion

The codec configuration is correct, but the protocol registration needs to be fixed to properly use the http_path codec. Once fixed, the testing patterns provided above will ensure comprehensive coverage of http-path functionality in multiaddr contexts.

@lla-dane
Copy link
Contributor Author

Thanks @acul71, for the detailed review.

@lla-dane
Copy link
Contributor Author

lla-dane commented Sep 19, 2025

@acul71 @seetadev:
Right now, Mutiaddr._from_string treats all protocols as /<proto>/<value>, which breaks for http-path since its value can contain multiple segments like this: /http-path/api/v1/users. The parser currently misinterprets those current segments as protocol names, causing errors.

So _from_string needs to handle http-path like unix, joining the remaining parts into a single-value.
So I am going to make the case for http-path similar to unix

@seetadev
Copy link
Contributor

@lla-dane : HI Abhinav. Thank you for sharing this update and the clear explanation around the http-path parsing issue. You’ve articulated the problem very well — I can see how treating all protocols uniformly as /proto/value would break down when handling multi-segment values like /http-path/api/v1/users. Your reasoning to handle http-path in a way similar to unix, by joining the remaining segments into a single value, makes complete sense. It’s a neat and practical approach that aligns with how other implementations solve similar cases.

This not only resolves a parsing bug but also improves developer experience, since http-path support is quite relevant for modern web-based use cases built on libp2p. We look forward to reviewing your implementation and future commits as you expand protocol coverage.

@lla-dane
Copy link
Contributor Author

@acul71 @seetadev : There are a few errors happening in the speacial character encoding parts in http-path addresses in two specific tests in the test_multiaddr file:

  1. test_http_path_url_encoding:
E           AssertionError: assert '/http-path/tmp/bar' == '/http-path/tmp%2Fbar'
E             
E             - /http-path/tmp%2Fbar
E             ?               ^^^
E             + /http-path/tmp/bar
E             ?               ^

test_multiaddr.py:861: AssertionError
=================================================== short test summary info ====================================================
FAILED test_multiaddr.py::test_http_path_url_encoding - AssertionError: assert '/http-path/tmp/bar' == '/http-path/tmp%2Fbar'

2.test_http_path_edge_cases:

>           assert m.value_for_protocol(P_HTTP_PATH) == path
E           AssertionError: assert 'path%20with%20spaces' == 'path with spaces'
E             
E             - path with spaces
E             ?     ^    ^
E             + path%20with%20spaces
E             ?     ^^^    ^^^

test_multiaddr.py:927: AssertionError
=================================================== short test summary info ====================================================
FAILED test_multiaddr.py::test_http_path_edge_cases - AssertionError: assert 'path%20with%20spaces' == 'path with spaces'

I dont understand how to we fix this. Most probably we have to change something in http_path.py::to_bytes() function, but I cant figure out what.

Only this case is failing, other than these all cases are working correcly.

- Update http_path codec to use quote(s, safe=) for consistent URL encoding
- Remove redundant URL encoding from transforms.py to prevent double encoding
- Update all HTTP path tests to expect URL-encoded values consistently
- Fix protocol tests to use same URL encoding approach as codec
- Ensure cross-language compatibility with Go multiaddr implementation

Fixes test failures in:
- test_http_path_url_encoding
- test_http_path_edge_cases
- test_http_path_bytes_string_roundtrip
- test_http_path_special_characters

All 251 tests now pass.
@acul71
Copy link
Contributor

acul71 commented Sep 19, 2025

There are a few errors happening in the speacial character encoding parts in http-path addresses in two specific tests in the test_multiaddr file:

HTTP Path URL Encoding Fix

Problem

The HTTP path codec had inconsistent URL encoding behavior causing test failures:

  • test_http_path_url_encoding: Expected /http-path/tmp%2Fbar but got /http-path/tmp/bar
  • test_http_path_edge_cases: Expected path%20with%20spaces but got path with spaces

Root Cause

The issue was in the to_string() method of the HTTP path codec. It wasn't properly URL-encoding forward slashes and other special characters, leading to inconsistent behavior between string representation and value extraction.

Solution

Updated multiaddr/codecs/http_path.py to use consistent URL encoding:

def to_string(self, buf: bytes) -> str:
    if len(buf) == 0:
        raise ValueError("empty http path is not allowed")
    # Return URL-encoded string to match Go implementation
    return quote(buf.decode("utf-8"), safe="")

Key Changes

  1. HTTP Path Codec: Use quote(s, safe="") to encode all characters including /
  2. String Representation: Remove redundant encoding from transforms.py
  3. Test Updates: Update test expectations to match Go implementation behavior

Result

  • ✅ All 251 tests pass
  • ✅ Consistent URL encoding behavior
  • ✅ Cross-language compatibility with Go implementation

The fix ensures HTTP paths are properly URL-encoded throughout the multiaddr system, maintaining compatibility with the broader libp2p ecosystem.

- Add test_http_path_only_reads_http_path_part: Test that http-path only reads its own part, not subsequent protocols
- Add test_http_path_malformed_percent_escape: Test rejection of malformed percent-escapes like %f
- Add test_http_path_raw_value_access: Test accessing raw unescaped values (similar to Go's SplitLast/RawValue)

- Fix http-path protocol parsing: Remove http-path from special path handling in _from_string
- Fix IS_PATH = False for http-path codec (should not consume all remaining parts)
- Fix string_to_bytes to handle protocols with SIZE=0 (like p2p-circuit)
- Fix codec SIZE attribute check to handle codecs without SIZE attribute

These changes ensure the Python implementation matches Go behavior exactly:
- http-path only consumes its immediate value, not subsequent protocols
- Proper handling of flag protocols (SIZE=0) in string-to-bytes conversion
- Complete test coverage matching Go implementation

All 254 tests pass, ensuring cross-language compatibility.
@acul71
Copy link
Contributor

acul71 commented Sep 20, 2025

@lla-dane
I've fixed the failing tests, and added some more tests taken from go implementation here https://github.com/multiformats/go-multiaddr

Please review the changes
If it seems good to you add a newsfragment and I'll merge this

@lla-dane
Copy link
Contributor Author

@acul71, thanks for the fix. I realize I’ve been asking for quite a few pointers, but each protocol addition is helping me understand the codebase better.

Added the newsfragment. @acul71 @seetadev

@acul71 acul71 merged commit 18df15f into multiformats:master Sep 22, 2025
17 checks passed
@lla-dane lla-dane changed the title WIP: Add missing protocols in reference with go-multiaddr Add http-path protocol in reference with go-multiaddr Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants