-
Notifications
You must be signed in to change notification settings - Fork 434
problematic "normalization" of paths #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can pass your own encoding function to Hackney with the setting Hackney has different goals from others "libs". We can of course relax the way the urls are encoded or provide an option. I'm not sure this feature should not be the default though since a lot of the current user base is used to it. Let see what can be done on the next major release. |
Here's an example URL that had a problem: The fundamental issue is that it is not possible to know whether a URL passed into hackney is already encoded. You can guess, and be right nearly all the time, but there will be rare occasions where the guess will be wrong--there is simply no way around this. Hackney should either expect unencoded URLs and encode them, or expect encoded URLs and leave them alone (the latter is my preference for a library). But it should definitely not encode some character sequences and leave others alone--that's just wrong. Are you sure your users are depending on this? Or are they mostly just getting lucky? |
@shribe you read the algorithm wrong. What is does is considering that the normal case is that you pass an unencoded URI end encode it following the RFC 2732. But still do its best to detect the case where some are still passing partially encoded URI. There is nothing wrong in doing it it. This is a choice made by Hackney (and not only) that you apparently dislike that much. It actually helped in a lot case. Also considering the users of hackney it's unlikely that they can be just lucky. But anyway what will be done for the current 1.x is adding an option to not encode the URI.I will ask around what the default should be in the coming major release. |
i will let the ticket open until the action above are done. |
I don't believe I am misreading it. You can have a URL where hackney will literally encode some "%" characters but not others. That's just flat-out wrong. (And, in my opinion, a good example of why trying to "detect" whether a URL needs encoding is a bad idea for a library.) |
I should also have said that, as far as I am concerned, an option to not attempt to encode the URI solves it. |
@shribe yeah I heard your opinion. The plan is:
|
@deadtrickster thanks for the links, was looking at it. I will check them again. |
@shribe just to not be misunderstood, i'm taking your suggestion in consideration. But it needs to be done step by step. |
I understand that it is the way it is right now, and that you have to be careful about changing the behavior. |
Here's my fix as a partial first step: #720 |
Hackney should NOT attempt to encode provided URLs, at least not without being asked to. At this level the library should assume that the URL is already properly encoded. The reason for this is that attempting to mimic the "convenience" of browsers introduces all sorts of ambiguities. (That functionality could be exposed via utility functions for clients that need it.)
For just one example, from hackney_url:
Eh? So, it is possible that in a single URL some "%" characters will get encoded and some will not??? What if calling code depends on hackney encoding the URL, but then one day is handed an unencoded URL with the literal characters "%2A" for instance? So hackney will not encode the "%", and the URL will eventually be mangled to contain "*" where "%2A" was intended.
Also, in the attempt to clean up for poorly-coded clients, it encodes "*" as "%2a", which is unnecessary and actually breaks with some URLs on some servers (us.rd.yahoo.com use "*" to demarcate a URL to syndicated content, and does not accept "%2a").
An HTTP client library, in contrast to a browser, should do the minimal parsing to separate out the scheme, host, port, user & password then pass the entire raw path[?query][#fragment] through without mangling it.
(Likewise, it should not second-guess the client and try to encode the host.)
The text was updated successfully, but these errors were encountered: