Tags: bluekeyes/go-gitdiff
Tags
Add String() methods to parsed types (#48) This enables clients to move back and forth between parsed objects and text patches. The generated patches are semantically equal to the parsed object and should re-parse to the same object, but may not be byte-for-byte identical to the original input. In my testing, formatted text patches are usually identical to the input, but there may be cases where this is not true. Binary patches always differ. This is because Go's 'compress/flate' package ends streams with an empty block instead of adding the end-of-stream flag to the last non-empty block, like Git's C implementation. Since the streams will always be different for this reason, I chose to also enable default compression (the test patches I generated with Git used no compression.) The main tests for this feature involve parsing, formatting, and then re-parsing a patch to make sure we get equal objects. Formatting is handled by a new internal formatter type, which allows writing all data to the same stream. This isn't exposed publicly right now, but will be useful if there's a need for more flexible formatting functions in the future, like formatting to a user-provided io.Writer.
Return preamble when a patch has no files (#46) While empty patches with only a header were parsable, the parser discarded the preamble content. This meant callers had to handle this case specially. Now, if we reach the end of the input without finding a file, Parse() returns the full content of the patch as the preamble.
Follow git logic when parsing patch identities (#44) When GitHub creates patches for Dependabot PRs, it generates a "From:" line that is not valid according to RFC 5322: the address spec contains unquoted special characters (the "[bot]" in "dependabot[bot]"). While the 'net/mail' parser makes some exceptions to the spec, this is not one of them, so parsing these patch headers fails. Git's 'mailinfo' command avoids this by only implementing the unquoting part of RFC 5322 and then applying a heuristic to separate the string in to name and email values that seem reasonable. This commit does two things: 1. Reimplements ParsePatchIdentity to follow Git's logic, so that it can accept a wider range of inputs, including quoted strings. Strings accepted by the previous implementation parse in the same way with one exception: inputs that contain whitespace inside the angle brackets for an email address now use the email address as the name and drop any separate name component. 2. When parsing mail-formatted patches, use ParsePatchIdentity to parse the "From:" line instead of the 'net/mail' function.
Accept empty emails in ParsePatchIdentity (#42) Git is actually more lenient here than I thought. As long as the identity contains the "<>" delimiters, Git will allow an empty email, so we should accept the same thing. I also discovered that an identity with only an email set will use the email as the name, so I've implemented that behavior as well.
Fix parsing of mode lines with trailing space (#38) If a patch is passed through a system that converts line endings to '\r\n', mode lines end up with trailing whitespace that confuses strconv.ParseInt. In Git, this is avoided by using strtoul() for parsing, which stops at the first non-digit character. Changing line endings in patch content itself will cause other problems so it is best to avoid this transform, but if it does happen, it shouldn't cause a parse error.
Fix EOF error for some files without final newline (#27) If a file was an exact multiple of 1024 bytes (the size of an internal buffer) and was missing a final newline, the LineReaderAt implementation would drop the last line, leading to an unexpected EOF error on apply. In addition to fixing the bug, slightly change the behavior of ReadLineAt to reflect how it is actually used: 1. Clarify that the return value n includes all lines instead of only lines with a final newline. This was already true except in the case of the bug fixed by this commit. 2. Only return io.EOF if fewer lines are read than requested. The previous implementation also returned io.EOF if the last line was missing a final newline, but this was confusing and didn't really serve a purpose. This is technically a breaking change for external implementations but an implementation that exactly followed the "spec" was already broken in certain edge cases.
PreviousNext