|
| 1 | +# Legacy Syntax Editions |
| 2 | + |
| 3 | +**Author:** [@mkruskal-google](https://github.com/mkruskal-google) |
| 4 | + |
| 5 | +**Approved:** 2023-09-08 |
| 6 | + |
| 7 | +Should proto2/proto3 be treated as editions? |
| 8 | + |
| 9 | +## Background |
| 10 | + |
| 11 | +[Edition Zero Features](edition-zero-features.md) lays out our plan for edition |
| 12 | +2023, which will unify proto2 and proto3. Since early in the design process, |
| 13 | +we've discussed the possibility of making proto2 and proto3 "special" editions, |
| 14 | +but never laid out what exactly it would look like or determined if it was |
| 15 | +necessary. |
| 16 | + |
| 17 | +We recently redesigned editions to be represented as enums |
| 18 | +([Edition Naming](edition-naming.md)), and also how edition defaults are |
| 19 | +propagated to generators and runtimes |
| 20 | +([Editions: Life of a FeatureSet](editions-life-of-a-featureset.md)). With these |
| 21 | +changes, there could be an opportunity to special-case proto2 and proto3 in a |
| 22 | +beneficial way. |
| 23 | + |
| 24 | +## Problem Description |
| 25 | + |
| 26 | +While the original plan was to keep editions and syntax orthogonal, that naively |
| 27 | +means we'd be supporting two very different codebases. This has some serious |
| 28 | +maintenance costs though, especially when it comes to test coverage. We could |
| 29 | +expect to have sub-optimal test coverage of editions initially, which would |
| 30 | +gradually become poor coverage of syntax later. Since we need to support both |
| 31 | +syntax and editions long-term, this isn't ideal. |
| 32 | + |
| 33 | +In the implementation of editions in C++, we decided to unify a lot of the |
| 34 | +infrastructure to avoid this issue. We define global feature sets for proto2 and |
| 35 | +proto3, and try to use those internally instead of checking syntax directly. By |
| 36 | +pushing the syntax/editions branch earlier in the stack, it gives us a lot of |
| 37 | +indirect test coverage for editions much earlier. |
| 38 | + |
| 39 | +A separate issue is how Prototiller will support the conversion of syntax to |
| 40 | +edition 2023. For features it knows about, we can hardcode defaults into the |
| 41 | +transforms. However, third party feature owners will have no way of signaling |
| 42 | +what the old proto2/proto3 behavior was, so Prototiller won't be able to provide |
| 43 | +any transformations by default. They'd need to provide custom Prototiller |
| 44 | +transforms hardcoding all of their features. |
| 45 | + |
| 46 | +## Recommended Solution |
| 47 | + |
| 48 | +We recommend adding two new special editions to our current set: |
| 49 | + |
| 50 | +``` |
| 51 | +enum Edition { |
| 52 | + EDITION_UNKNOWN = 0; |
| 53 | + EDITION_PROTO2 = 998; |
| 54 | + EDITION_PROTO3 = 999; |
| 55 | + EDITION_2023 = 1000; |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +These will be treated the same as any other edition, except in our parser which |
| 60 | +will reject `edition = "proto2"` and `edition = "proto3"` in proto files. The |
| 61 | +real benefit here is that this allows features to specify what their |
| 62 | +proto2/proto3 defaults are, making it easier for Prototiller to handle |
| 63 | +migration. It also allows generators and runtimes to unify their internals more |
| 64 | +completely, treating proto2/proto3 files exactly the same as editions. |
| 65 | + |
| 66 | +### Serialized Descriptors |
| 67 | + |
| 68 | +As we now know, there are a lot of serialized `descriptor.proto` descriptor sets |
| 69 | +out there that need to continue working for O(months). In order to avoid |
| 70 | +blocking edition zero for that long, we may need fallbacks in protoc for the |
| 71 | +case where feature resolution *fails*. If the file is proto2/proto3, failure |
| 72 | +should result in a fallback to the existing hardcoded defaults. We can remove |
| 73 | +these later once we're willing to break stale `descriptor.proto` snapshots that |
| 74 | +predate the changes in this doc. |
| 75 | + |
| 76 | +### Bootstrapping |
| 77 | + |
| 78 | +In order to get feature resolution running in proto2 and proto3, we need to be |
| 79 | +able to support bootstrapped protos. For these builds, we can't use any |
| 80 | +reflection without deadlocking, which means feature defaults can't be compiled |
| 81 | +during runtime. We would have had to solve this problem anyway when it came time |
| 82 | +to migrate these protos to editions, but this proposal forces our hand early. |
| 83 | +Luckily, "Editions: Life of a FeatureSet" already set us up for this scenario, |
| 84 | +and we have Blaze rules for embedding these defaults into code. For C++ |
| 85 | +specifically, this will need to be checked in alongside the other bootstrapped |
| 86 | +protos. Other languages will be able to do this more dynamically via genrules. |
| 87 | + |
| 88 | +### Feature Inference |
| 89 | + |
| 90 | +While we can calculate defaults using the same logic as in editions, actually |
| 91 | +inferring "features" from proto2/proto3 needs some custom code. For example: |
| 92 | + |
| 93 | +* The `required` keyword sets `LEGACY_REQUIRED` feature |
| 94 | +* The `optional` keyword in proto3 sets `EXPLICIT` presence |
| 95 | +* The `group` keyword implies `DELIMITED` encoding |
| 96 | +* The `enforce_utf8` options flips between `PACKED` and `EXPANDED` encoding |
| 97 | + |
| 98 | +This logic needs to be written in code, and will need to be duplicated in every |
| 99 | +language we support. Any language-specific feature transformations will also |
| 100 | +need to be included in that language. To make this as portable as possible, we |
| 101 | +will define functions like: |
| 102 | + |
| 103 | +Each type of descriptor will have its own set of transformations that should be |
| 104 | +applied to its features for legacy editions. |
| 105 | + |
| 106 | +#### Pros |
| 107 | + |
| 108 | +* Makes it clearer that proto2/proto3 are "like" editions |
| 109 | + |
| 110 | +* Gives Prototiller a little more information in the transformation from |
| 111 | + proto2/proto3 to editions (not necessarily 2023) |
| 112 | + |
| 113 | +* Allows proto2/proto3 defaults to be specified in a single location |
| 114 | + |
| 115 | +* Makes unification of syntax/edition code easier to implement in runtimes |
| 116 | + |
| 117 | +* Allows cross-language proto2/proto3 testing with the conformance framework |
| 118 | + mentioned in "Editions: Life of a FeatureSet" |
| 119 | + |
| 120 | +#### Cons |
| 121 | + |
| 122 | +* Adds special-case legacy editions, which may be somewhat confusing |
| 123 | + |
| 124 | +* We will need to port feature inference logic across all languages. This is |
| 125 | + arguably cheaper than maintaining branched proto2/proto3 code in all |
| 126 | + languages though |
| 127 | + |
| 128 | +## Considered Alternatives |
| 129 | + |
| 130 | +### Do Nothing |
| 131 | + |
| 132 | +If we do nothing, there will be no built-in unification of syntax and editions. |
| 133 | +Runtimes could choose any point to split the logic. |
| 134 | + |
| 135 | +#### Pros |
| 136 | + |
| 137 | +* Requires no changes to editions code |
| 138 | + |
| 139 | +#### Cons |
| 140 | + |
| 141 | +* Likely results in lower test coverage |
| 142 | +* May hide issues until we start rolling out edition 2023 |
| 143 | +* Prototiller would have to hard-code proto2/proto3 defaults of features it |
| 144 | + knows, and couldn't even try to migrate runtimes it doesn't |
0 commit comments