Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a class for field types #605

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

osa1
Copy link
Member

@osa1 osa1 commented Apr 10, 2022

This replaces the old int-based field type values with a FieldType class.
FieldType holds the base type (int32, string, enum, etc.) and the properties
repeated, optional, packed, and group.

The new classes FieldType and FieldBaseType are made public to allow
implementing new serialization formats as libraries.

FieldBaseType enum provides much better interface for both external
serializers and also within the protobuf.dart library:

  • Case analyses on field types are now checked by the compiler for
    exhaustiveness. As a result, all of the default branches of switch
    statements on field types are now gone.

  • We no longer need to manually mask the bits for optional, repeated etc.
    properties.

  • A lot of invalid states made unrepresentable: we don't need assertions about
    multiple base type bits being set. (e.g. 9da84ae)

    Note that previously for map fields we would set both the map and message
    bits. The code is refactored to treat maps the same as messages when needed.

This PR should does not change interface to the generated code so
protoc_plugin does not need any changes. We also do not remove any of the
existing public symbols, only add new ones (FieldType and FieldBaseType).
So this PR should be backwards compatible with both protoc_plugin-generated
code and the user-written code.

@osa1 osa1 requested review from mraleph and sigurdm April 10, 2022 20:14
@osa1 osa1 force-pushed the refactor_field_type branch from 6f6631e to d8ad30b Compare April 10, 2022 20:16
@osa1
Copy link
Member Author

osa1 commented Apr 10, 2022

This change should be backwards compatible and should not require any changes in protoc_plugin, but currently building test protos fails with:

Unhandled exception:
InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
#0      CodedBufferReader.readTag (package:protobuf/src/protobuf/coded_buffer_reader.dart:145:7)
#1      _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:33:21)
#2      GeneratedMessage.mergeFromCodedBufferReader (package:protobuf/src/protobuf/generated_message.dart:181:5)
#3      CodedBufferReader.readMessage (package:protobuf/src/protobuf/coded_buffer_reader.dart:103:13)
#4      _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:191:17)
#5      GeneratedMessage.mergeFromCodedBufferReader (package:protobuf/src/protobuf/generated_message.dart:181:5)
#6      CodedBufferReader.readMessage (package:protobuf/src/protobuf/coded_buffer_reader.dart:103:13)
#7      _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:199:17)
#8      GeneratedMessage.mergeFromCodedBufferReader (package:protobuf/src/protobuf/generated_message.dart:181:5)
#9      CodedBufferReader.readMessage (package:protobuf/src/protobuf/coded_buffer_reader.dart:103:13)
#10     _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:191:17)
#11     GeneratedMessage.mergeFromBuffer (package:protobuf/src/protobuf/generated_message.dart:196:5)
#12     new CodeGeneratorRequest.fromBuffer (package:protoc_plugin/src/generated/plugin.pb.dart:207:17)
#13     CodeGenerator.generate.<anonymous closure> (package:protoc_plugin/src/code_generator.dart:103:42)
<asynchronous suspension>
--dart_out: protoc-gen-dart: Plugin failed with status code 255.

@osa1
Copy link
Member Author

osa1 commented Apr 11, 2022

I'm working on fixing the issue described above. It seems like for the same buffer, at some point, we read a repeated int32 in master branch, but in this branch at the same point we read a single int32 (not repeated). To test this I added prints to _mergeFromCodedBufferReader to print type of the values being read. When compiling the same .proto file, in master branch, it goes like this:

...
message
repeated message
repeated int32
...

In this branch when compiling the same file:

...
message
repeated message
int32
Unhandled exception:
InvalidProtocolBufferException: _lastTag=0
#0      CodedBufferReader.readTag (package:protobuf/src/protobuf/coded_buffer_reader.dart:145:7)
#1      _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:33:21)
#2      GeneratedMessage.mergeFromCodedBufferReader (package:protobuf/src/protobuf/generated_message.dart:181:5)
#3      CodedBufferReader.readMessage (package:protobuf/src/protobuf/coded_buffer_reader.dart:103:13)
#4      _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:224:17)
#5      GeneratedMessage.mergeFromCodedBufferReader (package:protobuf/src/protobuf/generated_message.dart:181:5)
#6      CodedBufferReader.readMessage (package:protobuf/src/protobuf/coded_buffer_reader.dart:103:13)
#7      _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:233:17)
#8      GeneratedMessage.mergeFromCodedBufferReader (package:protobuf/src/protobuf/generated_message.dart:181:5)
#9      CodedBufferReader.readMessage (package:protobuf/src/protobuf/coded_buffer_reader.dart:103:13)
#10     _mergeFromCodedBufferReader (package:protobuf/src/protobuf/coded_buffer.dart:224:17)
#11     GeneratedMessage.mergeFromBuffer (package:protobuf/src/protobuf/generated_message.dart:196:5)
#12     new CodeGeneratorRequest.fromBuffer (package:protoc_plugin/src/generated/plugin.pb.dart:207:17)
#13     CodeGenerator.generate.<anonymous closure> (package:protoc_plugin/src/code_generator.dart:103:42)
<asynchronous suspension>
--dart_out: protoc-gen-dart: Plugin failed with status code 255.

I checked the changes in _mergeFromCodedBufferReader a few times and they look right. I wonder what the source for this buffer is. If it's coming from the library (no idea how that would work..) then perhaps the issue is in the writer?

@osa1
Copy link
Member Author

osa1 commented Apr 11, 2022

Ah, so by default assertions are not enabled and I was trying to run the plugin with assertions disabled. When I enable assertions with the patch below I get more helpful error messages. Should be easier to debug now.

diff --git a/protoc_plugin/bin/protoc-gen-dart b/protoc_plugin/bin/protoc-gen-dart
index b1f8169..86b62aa 100755
--- a/protoc_plugin/bin/protoc-gen-dart
+++ b/protoc_plugin/bin/protoc-gen-dart
@@ -1,3 +1,3 @@
 #!/bin/bash
 BINDIR=$(dirname "$0")
-dart "$BINDIR/protoc_plugin.dart" -c "$@"
+dart --enable-asserts "$BINDIR/protoc_plugin.dart" -c "$@"

@osa1
Copy link
Member Author

osa1 commented Apr 11, 2022

Blocked by #608. Not really blocked, when the assertion reported in #608 is commented out this PR passes the tests with assertions enabled.

@osa1 osa1 force-pushed the refactor_field_type branch from e3bf4b5 to 08dd5be Compare April 11, 2022 12:39
@osa1 osa1 marked this pull request as ready for review April 11, 2022 12:41
@osa1 osa1 marked this pull request as draft April 11, 2022 12:50
@osa1 osa1 marked this pull request as ready for review April 11, 2022 12:56
Copy link
Member

@mraleph mraleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the code becomes simpler to reason about, however I am somewhat concerned about performance implications. Have you tried running benchmarks for perf? It introduces additional memory reads where previously there were none.

protobuf/lib/src/protobuf/builder_info.dart Outdated Show resolved Hide resolved
protobuf/lib/src/protobuf/builder_info.dart Outdated Show resolved Hide resolved
isRepeated = false,
isRequired = false;

const FieldType.MAP()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to align names with Dart style guide, e.g. MAP should probably be map etc.

Even better pattern could be:

const FieldType._({
    required this.baseType,
    this.isGroup = false,
    this.isPacked = false,
    this.isRepeated = false,
    this.isRequired = false,
});

static const map = FieldType._(FieldBaseType.map); 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to align names with Dart style guide

We should check this automatically somehow, but I don't know if it's possible, and if it is how. I'll figure it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected that this is caught by https://dart-lang.github.io/linter/lints/non_constant_identifier_names.html ?

It part of core and recommended lint lists. I am surprised that this is not detected. /cc @pq

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

 const FieldType.MAP()

should definitely get flagged by non_constant_identifier_names.

Lints are a little subtle in some IDEs so maybe it was just overlooked.

Here's how it should look in IntelliJ for example:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that we disable this lint at the beginning of this file.

Most of the identifiers in the files are used by the generated code and have cryptic names and I guess we disable the lints for those. For now I will fix naming of the identifiers for user-written code manually.

@osa1
Copy link
Member Author

osa1 commented Apr 19, 2022

I am somewhat concerned about performance implications. Have you tried running benchmarks for perf?

This will definitely make things slower, but I'm hoping that it will either be negligible amount or we will find something else to improve perf back to the original numbers.

Having view types (aka. newtypes in other languages) would help here as we would be able to use the same int representation for this type and avoid extra reads.

I will learn how to run internal benchmarks this week and update.

This API will be used externally to implement new serialization formats, for example the protobuf text format (#125). I'm currently working on implementing #125 using this change as a PoC.

@osa1
Copy link
Member Author

osa1 commented Apr 27, 2022

Finally had time to benchmark these changes using the benchmarks in protobuf package.

VM:

Before After Diff
FromBinary 5,068 μs 5,124 μs +56 μs, +1.1%
ToBinary 5,791 μs 6,196 μs +405 μs, +6.9%
ToJson 22,920 μs 22,891 μs -29 μs, -0.1%
FromJson 18,664 μs 18,584 μs -80 μs, -0.4%
HashCode 1,922 μs 1,921 μs -1 μs, +0.0%

AOT:

Before After Diff
FromBinary 6,026 μs 6,211 μs +185 μs, +3.0%
ToBinary 10,602 μs 10,691 μs +89 μs, +0.8%
ToJson 30,545 μs 30,555 μs +10 μs, +0.0%
FromJson 22,789 μs 22,792 μs +3 μs, +0.0%
HashCode 5,031 μs 5,011 μs -20 μs, -0.3%

JS:

Before After Diff
FromBinary 20,937 μs 20,731 μs -206 μs, -0.9%
ToBinary 16,537 μs 17,699 μs +1,162 μs, +7.9%
ToJson 33,483 μs 33,633 μs +150 μs, +0.4%
FromJson 24,426 μs 25,062 μs +636 μs, +2.6%
HashCode 7,017 μs 7,223 μs +206 μs, +2.9%

It seems like ToBinary benchmark is affected the most. HashCode benchmark uses field types but interestingly it's only regressed in JS.

One way to improve representation of FieldType instances would be to combine these fields into a single int, like how we do in master branch currently:

  final FieldBaseType baseType;
  final bool isGroup;
  final bool isPacked;
  final bool isRepeated;
  final bool isRequired;

Then we provide getters to map the bits to bools or FieldBaseType. I will try this next.

@osa1
Copy link
Member Author

osa1 commented Apr 27, 2022

One way to improve representation of FieldType instances would be to combine these fields into a single int, like how we do in master branch currently:

I tried this in another branch: 04a5110.

Unfortunately it made things slower. I suspect this is because enum.index is not allowed in const context, so a lot of const code became non-const. Example:

// Before
const FieldType.REQUIRED_MESSAGE()
    : baseType = FieldBaseType.message,
      isGroup = false,
      isPacked = false,
      isRepeated = false,
      isRequired = true;

// After
FieldType.required_message()
    : _fieldTypePacked = FieldBaseType.message.index | _REQUIRED_BIT;

@osa1
Copy link
Member Author

osa1 commented May 22, 2022

Next thing to try here is to keep using an int for field types, but document that it shouldn't be used directly. Provide functions for getting packed, repeated etc. bits, and mapping an int for a field type to the FieldBaseType enum. It won't be as nice, but at least we'll have a "field type" enum and we'll be able to switch on it with exhaustiveness checks.

@osa1
Copy link
Member Author

osa1 commented Jan 26, 2023

I just realized today that the "view types" (renamed as "inline classes") proposal was merged a while ago: https://github.com/dart-lang/language/blob/master/accepted/future-releases/inline-classes/feature-specification.md

Currently none of the backends implement it, but they soon will, and we will be able to use an inline class for the field type type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants