Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backticks x as an alias for Q('x') #100

Open
DSLituiev opened this issue Jan 12, 2017 · 5 comments
Open

Backticks x as an alias for Q('x') #100

DSLituiev opened this issue Jan 12, 2017 · 5 comments

Comments

@DSLituiev
Copy link

This is a suggestion to implement backticks as an alias for quoting Q('...'). E.g.:

Q('x/y')  
  ==
`x/y` 

Rationale:

  1. Traditional: R syntax allows addressing fields as:
data$`x/y`
  1. User convenience: less letters and no need to select two different quotation marks.
@njsmith
Copy link
Member

njsmith commented Jan 12, 2017

Interesting idea. Thinking about the tradeoffs, there are two downsides I can see:

(1) Backticks look very similar to single-quotes. In Python in general the BDFL has pronounced that backticks won't be assigned any meaning, because of this usability problem ("syntax shouldn't look like grit on Tim's monitor"). I guess this is also somewhat of an advantage for us b/c it means that they won't be assigned any other meaning.

(2) Patsy currently relies on Python's tokenizer. Because Python doesn't use backticks as a quoting marker, the Python tokenizer crashes if fed backticks:

In [6]: list(patsy.tokens.python_tokenize("foo + `baz`"))
---------------------------------------------------------------------------
PatsyError                                Traceback (most recent call last)
<ipython-input-6-024d1474cf98> in <module>()
----> 1 list(patsy.tokens.python_tokenize("foo + `baz`"))

/home/njs/.user-python3.5-64bit/lib/python3.5/site-packages/patsy/tokens.py in python_tokenize(code)
     37                 raise PatsyError("error tokenizing input "
     38                                  "(maybe an unclosed string?)",
---> 39                                  origin)
     40             if pytype == tokenize.COMMENT:
     41                 raise PatsyError("comments are not allowed", origin)

PatsyError: error tokenizing input (maybe an unclosed string?)
    foo + `baz`
         ^

So the only way to implement this would be to fork our own copy of the tokenizer, and then make sure to keep it up to date with each Python release. (Actually, we would need multiple forks - at least one for python 2 and one for python 3, maybe more.) Unfortunately I don't see any way to really make this viable :-(

@DSLituiev
Copy link
Author

DSLituiev commented Jan 12, 2017

How about putting a thin layer on it:

def replace_backticks(x):
    if "`" not in x:
        return x
    pttrn = re.compile("`([^`]*)`")
    def repl(m):
        return "Q('" + m.group(1) + "')"
    return pttrn.sub(repl, x)


testlist = ["a ~ `50%`", 
      "t + `x/2` = `y` + `z`",
      "`x%z` ~ `a.z`",
      "a` ~ 12", 
      "y~x-1"]
for x in testlist:
    result = replace_backticks(x)
    print("="*20)
    print(x)
    print(result)

Returns:

====================
a ~ `50%`
a ~ Q('50%')
====================
t + `x/2` = `y` + `z`
t + Q('x/2') = Q('y') + Q('z')
====================
`x%z` ~ `a.z`
Q('x%z') ~ Q('a.z')
====================
a` ~ 12
a` ~ 12
====================
y~x-1
y~x-1

Note that the example 4 is broken

@njsmith
Copy link
Member

njsmith commented Jan 12, 2017

Other broken cases include things like the odd but currently valid)

Q("foo`bar")

I guess this isn't tooo bad because backticks are very rarely used, but... I dunno. I really like the thing where we use a real parser with fully-defined behavior.

@njsmith
Copy link
Member

njsmith commented Jan 12, 2017

I guess the other option would be some sort of fancy error-recovery support, where if lexing crashes we detect this case (the first unparsed character is backtick) and recover. Sounds messy but potentially doable...

@DSLituiev
Copy link
Author

Here is handling of back ticks within Q('')

import re

def _check_backticks_within_Q_(x):
    pttrn = re.compile("(Q\([\'\"]).*`.*([\'\"]\))")
    res = pttrn.finditer(x)
    try:
        next(res)
        return True
    except StopIteration:
        return False
    
def _replace_backticks_(m):
    return "Q('" + m.group(1) + "')"
    
def replace_backticks(x):
    if "`" not in x:
        return x
    elif _check_backticks_within_Q_(x):
        return x
    pttrn = re.compile("`([^`]*)`")
    return pttrn.sub(_replace_backticks_, x)


testlist = ["a ~ `50%`", 
      "t + `x/2` = `y` + `z`",
      "`x%z` ~ `x!#%^`",
      "y~x-1", 
        "y ~ Q('x`')",
        "y ~ Q('`x`')",
        "w ~ Q( ' x`!#%^' ) + Q('r1`')",
        'w ~ Q( " x`!#%^" )']

for x in testlist:
    result = replace_backticks(x)
    print("="*20)
    print(x)
    print(result)

Output:

====================
a ~ `50%`
a ~ Q('50%')
====================
t + `x/2` = `y` + `z`
t + Q('x/2') = Q('y') + Q('z')
====================
`x%z` ~ `x!#%^`
Q('x%z') ~ Q('x!#%^')
====================
y~x-1
y~x-1
====================
y ~ Q('x`')
y ~ Q('x`')
====================
y ~ Q('`x`')
y ~ Q('`x`')
====================
w ~ Q( ' x`!#%^' ) + Q('r1`')
w ~ Q( ' x`!#%^' ) + Q('r1`')
====================
w ~ Q( " x`!#%^" )
w ~ Q( " x`!#%^" )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants