-
Notifications
You must be signed in to change notification settings - Fork 18k
encoding/xml: add flag for stricter XML char parsing #69503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related Issues and Documentation (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
I’d prefer if strict validation were on by default, at least eventually if not initially. Would this be possible, or would it break the Go 1 stability guarantee? |
Making encoding/xml do strict validation by default would be OK with the Go 1 compatibility guarantee. Adding the GODEBUG would give people an easy way to back off. That said: why would strict validation be a good default? Who would that help in practice? |
XML with malformed characters is ill-formed, so the Principle of Least Astonishment suggests that it should be rejected. |
If we were starting from scratch, I would certainly agree. But we aren't. Making this change will most likely break some existing working code, which in an ecosystem that stresses backward compatibility is astonishing in a different way. |
Would it be better to create a new XML parsing entry point and mark the old entry points as deprecated? There are a ton of problems in |
I think that we should have a plan for encoding/xml/v2. It's clear that there are many problems with encoding/xml. But that is a big undertaking. In the meantime, we need to decide about this proposal, which is not about v2. |
👍
💯
Fair! You have much more experience than I do when it comes to the Go ecosystem, so I think it would be better for you to make this decision. Other options include adding an extra flag field or setter method. |
This proposal has been added to the active column of the proposals project |
There are many ways in which It's also not clear that a GODEBUG is the right way to control this. GODEBUG is very coarse-grained. We already have decoder configuration in the So here's a counter-proposal: we add a |
@aclements I don't think I understand your
This doesn't really make sense to me. If So, ISTM the straight forward way to do this is to have a bitfield To me, the
I am against this. I believe the error message should report the spec-violation. We parse customer XML data and we pass the error message reported by That is the target audience of "set To be helpful to us, that is to make it more discoverable that we need to set |
The zero value of Decoder is already not usable. You have to call NewDecoder. So in this case we don't have the constraint that new fields need to default to 0. But I definitely see your argument that What I'm trying to avoid is having One thing I like about Orthogonally, this doesn't have to be a bit field. It could be a set of bool fields. |
Stepping back, I think the first question is, why is stricter XML char parsing actually necessary? I don't think that's been answered in this issue. |
@aclements Programs might assume that |
@DemiMarie Thanks, but that kind of pushes the problem back one level. Why do those programs need to validate the XML? I'm not trying to be obnoxious, we're just trying to understand the need. |
@ianlancetaylor You aren’t being obnoxious 🙂. The general reason would be that they pass the XML, or data extracted from the XML, to something that would be safe if |
Thanks @DemiMarie , that's reasonably convincing. The tension between my proposed @ianlancetaylor suggested that we effectively deprecate the The default returned by That leaves the question of how |
@aclements I would like an easy way to get the strictest (read: most standards-compliant) parsing a given version of Go supports, without having to change the source code of the program. In the long term, I recommend differential fuzzing of |
If we do tie these new flags to |
What I mean is that as a program author, I want to get the strictest parsing the given version of Go supports, regardless of |
I see. If I understand you correctly, you want some way to ask for the strictest version of XML parsing, such that if some new Go release adds a new strictness flag, you get that turned on by default. One approach would be to add a new function |
@ianlancetaylor You understand correctly. |
People often say they want the strictest setting possible, but then when we add a new check and their program stops processing inputs it used to accept, they often change their mind. It's unclear we should do the "be as strict as possible including rejecting new things tomorrow" mode. What if the "Strictness" field is called Check, as in
and the |
It might be worth including #25755 in this proposal |
A comprehensive list of topics (namespace, procint, unicode,...) could be considered as stricter XML 1.0. As mentioned above, stricter is not always well accepted. The implementation of XML 1.1 using the foreseen prolog would introduce a version adhering to a published standard. The XML 1.1 prolog is only usable for full XML documents. It implies that partial documents would go through the current behaviour. It should avoid the permanent concern of breaking code. Quoting the XML 1.1 introduction seems fair XML 1.1 to feed this discussion.
|
With a
This seems like a good simplification. |
Have all remaining concerns about this proposal been addressed? The proposal is to add a
We will also define a new In the future, additional fields may be added to the |
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add a
We will also define a new In the future, additional fields may be added to the |
Drive-by comment: perhaps it would be nicer to define a new type for the value |
Good point. We can certainly give it a name. |
No change in consensus, so accepted. 🎉 The proposal is to add a
The
We will also define a new In the future, additional fields may be added to the |
In the approved comments, I suppose that "Entities" should be read instead of "Entites". While looking in the implementation, defining xmlcheck as a comma-separated list does not totally respect the existing definition (https://pkg.go.dev/runtime#hdr-Environment_Variables) which says
For instance, |
wouldn't |
At least I was consistent in the comments 😅. Thanks. Fixed in place.
Oh, that's a good point. I'll bring this detail back to proposal review. |
Using
Another option would be to have a separate
I think separate flags rather than a single list also makes it more clear that this adds to any flags set by the application, rather than overriding all of them. |
Let's go with |
Currently on tip, an entity as The wording of |
From the discussion above it seems like the intention is for I understand the argument for backward-compatibility and don't wish to argue for any change to the accepted compromises, but I would like to advocate for changing the docstring of the package so that it explicitly states that this is not a fully-conforming XML 1.0 parser, describes the various ways it diverges from the specification -- at least, the ones we already know about -- and mentions that the level of conformance is partially configurable but that 100% conformance is not currently available. My motivation here is related to the "principle of least astonishment" discussed earlier: if this package claims that it's a conforming XML 1.0 parser then it's likely to be used in ways that rely on that being true, potentially causing general functionality or security problems for depending applications. Hopefully an explicit note in the docs will cause developers to look a little more closely to make sure that the current implementation is suitable for their needs. |
That is my thought too. Given that making |
Change https://go.dev/cl/662235 mentions this issue: |
Proposal Details
Background
As reported by @DemiMarie
The encoding/xml package does not properly validate that the characters within comments, processing instructions, or directives are properly within the CharData range as defined by the XML specification.
Proposal
Add a godebug flag,
xmlvalidatechars=1
, which enables more strict validation of characters within comments, processing instructions, and directives. It is my understanding that changing XML behavior can sometimes lead to unexpected behavior/breaking changes, but I have tested what would happen if this flag were enabled by default internally and ran into zero issues.The text was updated successfully, but these errors were encountered: