Skip to content

Conversation

@vshakitskiy
Copy link

@vshakitskiy vshakitskiy commented Sep 6, 2025

After running Autobahn tests on ewe WebSocket implementation and successfully passing all the cases, here are some general improvements:

  • Add strict WebSocket protocol compliance validation
  • Add control frame fragmentation validation; control frames must not be fragmented
  • Add close frame validation; validate close codes and UTF-8 payload
  • Fix partial frame handling for TCP fragmentation; incomplete frames return NeedMoreData instead of InvalidFrame
  • Fix masked frame parsing for incomplete frames
  • Add UTF-8 validation for complete text frames and assembled fragmented messages
  • Extract RSV bits in frame header parsing instead of ignoring as reserved field
  • Optimize data unmasking performance by replacing recursive byte-by-byte XOR with a bulk XOR operation using a repeating mask key
  • Defer decompression to the aggregation phase to correctly handle fragmented compressed messages per the permessage-deflate extension, instead of inflating per-frame
  • Rewrite aggregate-frames to accumulate payloads in a list and concatenate once at the end, rather than incremential bit array appends
  • Fix apply_inflate to correctly call compression.inflate instead of compression.deflate (apply_inflate calling wrong function #6)
  • Add decode_many_frames_result function with proper error propagation (decided to not touch decode_many_frames)

Copy link
Owner

@rawhat rawhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks for all this work!

// Complete standalone frame
[Complete(frame), ..rest], None -> {
case frame {
Data(CompressedTextFrame(data)) -> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is an interesting one... and is making me think a bit more about the types we have here

with the addition of these to DataFrame and the panics above, i wonder if we should have some intermediate type that's only used within the contexts of these parsing functions. if this type is on the public DataFrame type, now any consumer matching on it has to handle this, when it's not really valid outside of this library's handling of the bits

even if it's just a wrapper only used in here around the compress-able frames, that seems maybe fairly innocuous? does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can some InternalDataFrame type that will be used only within parsing functions, and have public DataFrame with only TextFrame and BinaryFrame. What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that definitely sounds better to me, i think!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after playing with the code a little, it seems like it would be not that easy to implement my initial proposal with InternalDataFrame type. With current public functions & types there will be constant mappings from internal types to public ones, the codebase will become very overcomplicated. There is few options that I can see here: we can figure out different solution that removes CompressedTextFrame & CompressedBinaryFrame, but right now I have no idea how to do that 😅... Or we can have these types public for now and later in next releases hide that types from consumer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using Gleam's opaque for DataFrame could also be a clean way forward. We keep all four variants in the type for internal parsing/aggregation, but make it opaque to hide constructors from users. Then, add public constructors just for TextFrame and BinaryFrame, and a match_data_frame function that takes callbacks for text/binary payloads and includes a bool is_compressed flag. This lets users handle incomplete compressed frames without direct variant matching, avoids new internal types or mapping boilerplate, and ensures aggregation always delivers decompressed public variants. What do you think about this one?

@vshakitskiy
Copy link
Author

vshakitskiy commented Sep 8, 2025

I implemented my proposed idea with opaque type DataFrame. Here is the example of how would ewe convert gramps DataFrame type to its own type with new changes. As you can see parsing types are not being exposed to consumer. aggregate_frames eliminates all possibilities that user will receive CompressedTextFrame or CompressedBinaryFrame, so after aggregation we can drop is_compressed flags in match_data_frame callbacks. If consumer will want to construct DataFrame they can use public text_frame or binary_frame constructors, so there is no way for consumer to construct CompressedTextFrame or CompressedBinaryFrame.

If you think this is not the right approach, i can revert the opaque change.

@vshakitskiy vshakitskiy requested a review from rawhat September 8, 2025 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants