Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

":" in URLs should not be encoded #29

Closed
rcarmo opened this issue May 22, 2016 · 11 comments
Closed

":" in URLs should not be encoded #29

rcarmo opened this issue May 22, 2016 · 11 comments
Labels

Comments

@rcarmo
Copy link

rcarmo commented May 22, 2016

Following up from #27, custom URL schema processing requires the colon to not be URL-encoded.

@ikirudennis
Copy link
Member

I've drawn up a subclass of Textile which looks like it will function for your needs: rawlink_textile.py Give it a try, and let me know if you need any further assistance.

@rcarmo
Copy link
Author

rcarmo commented May 22, 2016

Thanks, I will. But it does seem like an awful lot of code (and an extra dependency on six). I wonder why the links are getting encoded in the first place...

On 22 May 2016, at 16:19, Dennis Burke [email protected] wrote:

I've drawn up a subclass of Textile which looks like it will function for your needs: rawlink_textile.py Give it a try, and let me know if you need any further assistance.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@ikirudennis
Copy link
Member

Textile already requires six, so it's not an extra dependency.

Encoding the URIs is just the right thing to do: https://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters I understand your use-case for it, but it seems that anything that processes URIs should be able to process the percent-encoded URI without issue. If it seems like the new version of textile broke your system, it's because textile was previously incorrect.

@rcarmo
Copy link
Author

rcarmo commented May 23, 2016

I'm sorry, but that just doesn't make sense.

That article pertains to encoding payload data, not markup.

I don't have that issue with Markdown or reStructured Text, and Textile worked that way on Python for at least eight years (and before that on PHP - my site was a PHPWiki back in 2001).

I also don't remember having that issue with the Java ports (which I used in Clojure when I did a test implementation a couple of years back), so as far as I'm concerned not encoding the schema in URLs is the expected behavior.

On 23 May 2016, at 01:16, Dennis Burke [email protected] wrote:

Textile already requires six, so it's not an extra dependency.

Encoding the URIs is just the right thing to do: https://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters I understand your use-case for it, but it seems that anything that processes URIs should be able to process the percent-encoded URI without issue. If it seems like the new version of textile broke your system, it's because textile was previously incorrect.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@ikirudennis
Copy link
Member

The way you have your URIs structured, they are not seen as having schemes (technically what you have are paths, which should be percent-encoded). And that's all due to the RFCs governing the structure of URIs. The part of the gist which I've modified has to do with urllib's handling and parsing of the URI. For instance:

>>> from six.moves import urllib
>>> urllib.parse.urlsplit('Radar:1234')
SplitResult(scheme='', netloc='', path='Radar:1234', query='', fragment='')
>>> urllib.parse.urlsplit('Radar:/1234')
SplitResult(scheme='radar', netloc='', path='/1234', query='', fragment='')
>>> urllib.parse.urlsplit('Radar://1234')
SplitResult(scheme='radar', netloc='1234', path='', query='', fragment='')

I can't speak to how implementations of textile in other languages may have treated this situation in the past. My goal here is to have python-textile adhere as closely as possible to both the spec and php-textile in their current forms. (And unfortunately, we were so far behind the times that this update seems like a very drastic change.) For what it's worth, I've filed an issue on the textile-spec (textile/textile-spec#3) repo asking for some clarity on the proper handling of a situation like this.

Please let me know if that gist resolves your issue. You could also put your schema transformation in that encode_url function instead, if you wanted to eliminate a step.

@rcarmo
Copy link
Author

rcarmo commented May 23, 2016

Sorry, but I still think this is a breaking change that needs to be fixed. Looking at the RedCloth ragel grammar (which is the closest I can find to a machine-readable formal definition of the Textile language, lacking other PEG or EBNF alternatives), the definition for a URL is pretty clear:

https://github.com/jgarber/redcloth/blob/master/ragel/redcloth_common.rl#L111 https://github.com/jgarber/redcloth/blob/master/ragel/redcloth_common.rl#L111

On 23 May 2016, at 14:23, Dennis Burke [email protected] wrote:

The way you have your URIs structured, they are not seen as having schemes (technically what you have are paths, which should be percent-encoded). And that's all due to the RFCs governing the structure of URIs. The part of the gist which I've modified has to do with urllib's handling and parsing of the URI. For instance:

from six.moves import urllib
urllib.parse.urlsplit('Radar:1234')
SplitResult(scheme='', netloc='', path='Radar:1234', query='', fragment='')
urllib.parse.urlsplit('Radar:/1234')
SplitResult(scheme='radar', netloc='', path='/1234', query='', fragment='')
urllib.parse.urlsplit('Radar://1234')
SplitResult(scheme='radar', netloc='1234', path='', query='', fragment='')
I can't speak to how implementations of textile in other languages may have treated this situation in the past. My goal here is to have python-textile adhere as closely as possible to both the spec and php-textile in their current forms. (And unfortunately, we were so far behind the times that this update seems like a very drastic change.) For what it's worth, I've filed an issue on the textile-spec (textile/textile-spec#3 textile/textile-spec#3) repo asking for some clarity on the proper handling of a situation like this.

Please let me know if that gist resolves your issue. You could also put your schema transformation in that encode_url function instead, if you wanted to eliminate a step.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #29 (comment)

@ikirudennis
Copy link
Member

...Still wondering if that gist works for you.

@ikirudennis
Copy link
Member

Any update on whether the gist works for you?

@rcarmo
Copy link
Author

rcarmo commented Jun 6, 2016

I stand by my previous statement that this is a breaking change that needs to be addressed. I haven't tried the gist because I need this to work without patching, and as such I reverted to the previous version. Sorry.

@ikirudennis
Copy link
Member

The reason I keep asking is to confirm whether or not the gist actually solves your problem without causing other errors for you. Whether you choose to use it in production is totally up to you. (For what it's worth, the gist functions as a drop-in fix, not exactly a patch. You might be able to get away with import rawlink_textile as textile and be on your way.) As it is, if I make that change to the project, too many other things change and get out of alignment with php-textile. We're sort of locked in a case of XKCD 1172: https://xkcd.com/1172/

I'm not going to make a change to this until we get a straight answer from textile-spec regarding a way forward. Sorry.

@rcarmo
Copy link
Author

rcarmo commented Jun 6, 2016

I can wait (and simply vendor the old version). Nevertheless, so far, the research I did into RedCloth (including the "new" version) supports my case in terms of historical precedent (that particular bit of the grammar I linked to goes back ages).

With some luck, the textile-spec folk will consider defining a formal grammar...

On 06 Jun 2016, at 18:08, Dennis Burke [email protected] wrote:

The reason I keep asking is to confirm whether or not the gist actually solves your problem without causing other errors for you. Whether you choose to use it in production is totally up to you. (For what it's worth, the gist functions as a drop-in fix, not exactly a patch. You might be able to get away with import rawlink_textile as textile and be on your way.) As it is, if I make that change to the project, too many other things change and get out of alignment with php-textile. We're sort of locked in a case of XKCD 1172: https://xkcd.com/1172/

I'm not going to make a change to this until we get a straight answer from textile-spec regarding a way forward. Sorry.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants