Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PointerBuf::parse accepts Into<String> but does not include the String in the Err #96

Open
chanced opened this issue Nov 7, 2024 · 8 comments

Comments

@chanced
Copy link
Owner

chanced commented Nov 7, 2024

We likely take ownership of the input to PointerBuf::parse but do not include it in resulting ParseError.

PointerBuf::parse:

pub fn parse(s: impl Into<String>) -> Result<Self, ParseError> {
    let s = s.into();
    validate(&s)?;
    Ok(Self(s))
}

ParseError:

pub enum ParseError {
    NoLeadingBackslash,
    InvalidEncoding {
        offset: usize,
        source: InvalidEncodingError,
    },
}
@chanced chanced changed the title PointerBuf::parse likely takes ownership of the input Into<String> but does not include it in the Err PointerBuf::parse likely takes ownership of input but does not return the String in the Err Nov 7, 2024
@chanced chanced changed the title PointerBuf::parse likely takes ownership of input but does not return the String in the Err PointerBuf::parse likely takes ownership of input but does not return it in the Err Nov 7, 2024
@chanced chanced changed the title PointerBuf::parse likely takes ownership of input but does not return it in the Err PointerBuf::parse accepts Into<String> but does not include the String in the Err Nov 7, 2024
@chanced
Copy link
Owner Author

chanced commented Nov 7, 2024

@asmello referencing my comment in our conversation over on the error pull request #93 (comment)

@chanced
Copy link
Owner Author

chanced commented Nov 7, 2024

This is bleed-over from our discussion in #93 but this is what I had in mind for Report:

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Subject {
    String {
        value: String,
        offset: usize,
        len: usize,
    },
    Pointer {
        value: PointerBuf,
        position: usize,
    },
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Report<E> {
    pub subject: Subject,
    pub error: E,
}

pub trait IntoReport: Sized {
    type Value;
    fn into_report(self, value: Self::Value) -> Report<Self>;
}

impl IntoReport for crate::ParseError {
    type Value = String;
    fn into_report(self, value: Self::Value) -> Report<Self> {
        let offset = self.complete_offset();
        let len = match &self {
            Self::NoLeadingBackslash => 1,
            Self::InvalidEncoding { offset, .. } => {
                if *offset < value.len() - 1 { 2 } else { 1 }
            }
        };
        Report {
            subject: Subject::String { value, offset, len },
            error: self,
        }
    }
}

The reason I'm mentioning it here is perhaps we could just return Report<ParseError> from PointerBuf::parse? It is a bit strange, naming wise, though.

Edit

remove Span and Location from IntoReport

@asmello
Copy link
Collaborator

asmello commented Nov 10, 2024

Oh no, I don't think it's worth generalising it this much.

This is how I envision the ideal API:

pub fn parse<'s>(s: impl Into<Cow<'s, str>>) -> Result<Self, ParseError<'s>> {
    let s = s.into();
    validate(s.as_ref())?;
    Ok(Self(s.into_owned()))
}

pub enum ParseError<'s> {
    NoLeadingBackslash,
    InvalidEncoding(InvalidEncodingError<'s>),
}

#[derive(Error, Diagnostic, Debug)]
#[error("json pointer is malformed due to invalid encoding ('~' not followed by '0' or '1')")]
pub struct InvalidEncodingError<'s> {
    #[label]
    span: SourceSpan,
    #[source_code]
    source: Cow<'s, str>
}

impl<'s> InvalidEncodingError<'s> {
    pub fn origin(&self) -> Cow<'s, str> { ... } // this is how you recover the original string
    pub fn offender(&self) -> &str { ... } // the slice where we found the issue
}

My suggestion is to start writing down errors like this explicitly, and then any patterns worth exploiting will be easy to identify. We may actually be better served by a macro. Trying to design some generic types upfront to cover all the different subtypes and nesting cases will be difficult, and I suspect overlap is more restricted than we imagine (consider all the variations of source being string, pointer, owned, borrowed, and nesting enums etc.).

@asmello
Copy link
Collaborator

asmello commented Nov 10, 2024

There's no way around breaks to achieve that API. Here's a rough plan for transitioning a bit more gradually:

  1. We move the current errors to a default-on feature flag, and add the new ones under a different (non-default) one. We mark the current errors as deprecated, which should produce some warnings we can use to encourage migrating to the new flag.
  2. Next release we swap which flag is default-on. That is, the old errors are accessible, but no longer default.
  3. Next release we remove the old errors entirely (and their flag).
  4. Then we remove the other flag.

2 transitory releases should be plenty for people to migrate, especially since we're in 0.x still. I'll make a point of migrating json-patch (probably our biggest user), but I think using mini-crater is overkill.

@chanced
Copy link
Owner Author

chanced commented Nov 12, 2024

I'm sorry, I've been distracted and haven't been able to devote enough energy to this (or finish my impl).

Oh no, I don't think it's worth generalising it this much.

So there are a few reasons I opted to make the error generic and be an additional layer that a user opts into. I'm going to consolidate reasons here from the other thread we have pertaining to this.

First, Assign and Resolve both have associated errors and I didn't see a whole lot of value breaking their logic. I also don't think traversal should allocate on either path. Both operations would potentially benefit from reporting though.

Second, one of my use cases for jsonptr with json schema is for validation of formatting. Failure in that case would still be on the happy path and so I'd like to avoid allocating. This one isn't huge, I can just replicate the parsing logic into my schema crate.

Third, there are aspects of having opt-in composition to extend out the verbosity of the error that I really like. In doing so, only users that need the functionality pay for it. On the other hand, I don't really care for this API (and its the best I've come up with since my previous attempt failed):

let report = Pointer::parse("invalid~encoding")
    .report_err("invalid~encoding")
    .unwrap_err();
assert_eq!(report.subject.as_str(), "invalid~encoding");

// alternatively
let err = Pointer::parse("invalid~encoding").unwrap_err();
let report = Report::new("invalid~encoding", err);
assert_eq!(report.subject.as_str(), "invalid~encoding");

Now, having said all that, pretty much every instance where you've voiced concern or dislike of a change I've proposed and implemented, I've regretted not listening. I also do not like that ParseBufError cannot be matched on directly - that change alone will likely break a lot. I did implement From<ParseBufError> for ParseError to reduce some breaks - PointerBuf::parse(s)? should still work for fns returning Result<_, ParseError>.

a rough plan for transitioning a bit more gradually: ...

That's a really solid strategy. I've been really reluctant to pump out so many breaks, especially around errors.

I was going to use mini-crater to try try an determine if the offset on errors is actually being used. If not, we could safely remove it. It likely isn't worth the time, especially if we use this strategy.

@asmello
Copy link
Collaborator

asmello commented Nov 12, 2024

I'm sorry, I've been distracted and haven't been able to devote enough energy to this (or finish my impl).

No worries at all, I was in the same situation very recently.

First, Assign and Resolve both have associated errors and I didn't see a whole lot of value breaking their logic. I also don't think traversal should allocate on either path. Both operations would potentially benefit from reporting though.

Second, one of my use cases for jsonptr with json schema is for validation of formatting. Failure in that case would still be on the happy path and so I'd like to avoid allocating. This one isn't huge, I can just replicate the parsing logic into my schema crate.

That's reasonable. Not all errors indicate a failure, and the unhappy path is not always uncommon. Makes sense to avoid allocations when that's the case.

Third, there are aspects of having opt-in composition to extend out the verbosity of the error that I really like. In doing so, only users that need the functionality pay for it. On the other hand, I don't really care for this API (and its the best I've come up with since my previous attempt failed):

Ok, I think I get your intent better now. But I suspect this is better managed by feature flags than a custom trait system. miette was already designed to work with existing errors by extending them, so we don't necessarily need structural changes to our errors to enable its use. thiserror is also just a convenience. Instead of rolling out our own Report type and IntoReport trait, we can just add impl Diagnostic for our types, and gate it behind a miette feature flag. Or we could gate the attribute macros themselves, which may be less work. Regardless, we don't need to hold SourceSpan or any foreign values in the error types themselves. We just need to keep something that can hold the "source code", and something that specifies the span where the error occurred - the conversion to miette-specific types is handled by the Diagnostic trait implementation as needed.

The biggest thing convincing me right now of moving in this direction is that, going from your example, we'd presumably implement impl IntoReport for SomeError regardless, so it's not scaling any better with the generic wrapper type than simply using the original type + impl Diagnostic. I suppose users could then get some miette-like error enhancements even if they don't want the miette dependency, but at that point we'd be essentially duplicating miette code.

I did implement From for ParseError to reduce some breaks - PointerBuf::parse(s)? should still work for fns returning Result<_, ParseError>

I don't think that works unfortunately, as From is not transitive, and ?? doesn't do a double conversion (instead it expects a Result<Result<_,_>,_>). Only types T that implement From<T> for FooError directly work with ? in fn foo() -> FooError...

@chanced
Copy link
Owner Author

chanced commented Nov 12, 2024

I wasn't sure if there were (or will be) competitors to miette and so rather than creating a hard dependency (even through a feature flag), I figured having an in-house type that implements Diagnostic. That way, if they want to use bespoke logic or a different crate, they still have all the material to do so and for those that want miette, enabling it would just enable the Diagnostic trait implementation. I'll start on that as soon as I finish writing this so you can see it and be able to appraise it in its complete form.

so it's not scaling any better with the generic wrapper type than simply using the original type + impl Diagnostic

Well, sort of, right? We have 3 - 4 errors (ParseError, assign::Error, resolve::Error, and potentially ParseBufError) that need the source string or pointer. If ParseBufError stays, we can implement Diagnostic for it but 3 would still need the source and the errors for resolve and assign probably shouldn't have the pointer attached under most conditions.

I don't think that works unfortunately, as From is not transitive, and ?? doesn't do a double conversion (instead it expects a Result<Result<,>,_>). Only types T that implement From for FooError directly work with ? in fn foo() -> FooError...

It does, since ParseBufError is just just 1 into away from ParseError:

fn example() -> Result<(), ParseError> {
    PointerBuf::parse("")?; // Result<_, ParseBufError>;
    Ok(())
}

@asmello
Copy link
Collaborator

asmello commented Nov 12, 2024

Sorry, crossed the wires slightly in my last comment, to clarify the feature flag would be to avoid the cost of compiling with miette as a dependency. At runtime you'd still only pay the "enhancement" (effectively, additional allocations + conversions) if you actually use the errors within the miette ecosystem (i.e. with wrapping).

I wasn't sure if there were (or will be) competitors to miette and so rather than creating a hard dependency (even through a feature flag), I figured having an in-house type that implements Diagnostic. That way, if they want to use bespoke logic or a different crate, they still have all the material to do so and for those that want miette, enabling it would just enable the Diagnostic trait implementation. I'll start on that as soon as I finish writing this so you can see it and be able to appraise it in its complete form.

Yeah, it's a good idea to keep miette out of the public API, but trait impls are generally fine because they're additive (esp. under a FF). That is, we're still in control of the public error types, so if miette stops being maintained and/or we decide to migrate to something else, our types are still going to be around and we can just deprecate the feature without changing our API itself. We can think of this similarly to how crates generally handle serde compatibility.

Well, sort of, right? We have 3 - 4 errors (ParseError, assign::Error, resolve::Error, and potentially ParseBufError) that need the source string or pointer. If ParseBufError stays, we can implement Diagnostic for it but 3 would still need the source and the errors for resolve and assign probably shouldn't have the pointer attached under most conditions.

Ok, interesting, why should we avoid having the pointer attached (even as a reference, I assume) for assign and resolve errors? Is it because of the lifetimes? I don't think that's too bad. Adds a bit of complexity but feels worth the trouble. Or maybe I'm just not understanding the use case well.

It does, since ParseBufError is just just 1 into away from ParseError:

Oooh, ok, didn't get that this was the ? you were referring to. In that case, nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants