Skip to content

Change GPL dependencies to MIT-compatible counterparts#90

Open
danielpunkass wants to merge 29 commits intothisisparker:mainfrom
danielpunkass:no-gpl
Open

Change GPL dependencies to MIT-compatible counterparts#90
danielpunkass wants to merge 29 commits intothisisparker:mainfrom
danielpunkass:no-gpl

Conversation

@danielpunkass
Copy link
Copy Markdown

The Unidecode and html2text dependencies are GPL2 and GPL3 licensed, so they are not compatible with the MIT licensing of this project as a whole. See #61

@danielpunkass danielpunkass marked this pull request as draft January 7, 2023 19:25
@danielpunkass
Copy link
Copy Markdown
Author

I think this is ready for review now. The needs of xword-dl for both Unidecode and html2text seem light-weight enough that I suspect the replacement MIT-licensed libraries will fit the bill.

@danielpunkass danielpunkass marked this pull request as ready for review January 7, 2023 23:22
@thisisparker
Copy link
Copy Markdown
Owner

Thank you for the PR! Holding on this just for the moment because I might have to make some changes to how I'm using html2text anyway for #89. I'll keep you posted!

@danielpunkass
Copy link
Copy Markdown
Author

Cool, no rush!

@afontenot
Copy link
Copy Markdown
Contributor

afontenot commented Jun 2, 2025

@danielpunkass @thisisparker I think it would be really nice to see some version of this merged. I've created a pull request against the merging branch with what I think is an improvement. Rather than reimplementing the entire anyascii function, we can just pass the codepoints of the string that need replacement.

Those familiar with Unicode might be surprised you can do this. It's actually because anyascii itself iterates by codepoints, so this approach at least doesn't introduce any issues that don't already exist with that package. That doesn't mean that there aren't issues, e.g. anyascii("🏳️‍⚧️") returns :flag_white::transgender_symbol: which is not really ideal (and implies that we can't eliminate the demojize function with anyascii).

Because anyascii caches lookups on a module level and iterates by
codepoint anyway, it's perfectly efficient to filter out latin-1
codepoints and pass the rest through anyascii.
@danielpunkass
Copy link
Copy Markdown
Author

Thanks for the attention to the PR @afontenot ! I will have to get my head back around some of the rationale for the way I approached things but I will review and revise the PR branch soon based on your reviews.

@danielpunkass
Copy link
Copy Markdown
Author

I've updated the branch with most of the tweaks from @afontenot, and merged latest main to make the PR merge easier if/when accepted.

@danielpunkass
Copy link
Copy Markdown
Author

There's something wrong with branch that is causing one of the tests to fail:

⭐ Run Main Test New York Times rebus special chars
[Status check on supported outlets/Install xword-dl and test existing outlets]   🐳  docker exec cmd=[bash -e /var/run/act/workflow/27] user= workdir=
| Traceback (most recent call last):
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/bin/xword-dl", line 8, in <module>
|     sys.exit(main())
|              ^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/xword_dl/xword_dl.py", line 292, in main
|     puzzle, filename = by_keyword(args.source, **options)
|                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/xword_dl/xword_dl.py", line 49, in by_keyword
|     puzzle = dl.download(puzzle_url)
|              ^^^^^^^^^^^^^^^^^^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/xword_dl/downloader/basedownloader.py", line 85, in download
|     puzzle = self.parse_xword(xword_data)
|              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/xword_dl/downloader/newyorktimesdownloader.py", line 176, in parse_xword
|     puzzle.rebus()
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/puz.py", line 308, in rebus
|     return self.helpers.setdefault('rebus', Rebus(self))
|                                             ^^^^^^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/puz.py", line 559, in __init__
|     for item in parse_dict(solutions_str).items()
|                 ^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/opt/hostedtoolcache/Python/3.11.12/arm64/lib/python3.11/site-packages/puz.py", line 764, in parse_dict
|     return dict(p.split(':') for p in s.split(';') if ':' in p)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| ValueError: dictionary update sequence element #0 has length 4; 2 is required

I'll figure out what's going on. Probably some dependency disparity.

@afontenot
Copy link
Copy Markdown
Contributor

There's something wrong with branch that is causing one of the tests to fail:

...

I'll figure out what's going on. Probably some dependency disparity.

Looks to me like a discrepancy between how anyascii and unidecode are handling the NYT puzzle in question. xword-dl is latinizing the rebus answers, and it looks like probably anyascii is returning a :shortcode: for one of the puzzle clues here. The library that's supposed to save the puzzle file, puzpy, is creating a dict by splitting on : and this is almost certainly a bug - it ought to be using .partition(). Unfortunately I can't see what clue is causing the issue because I don't have a NYT subscription.

requirements.txt Outdated
pyyaml==6.0.2
xmltodict==0.14.2
lxml==5.4.0
lxml==5.3.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you reverted this to the pre-main version by mistake: cb4ad4b

Copy link
Copy Markdown
Author

@danielpunkass danielpunkass Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string in question that's tripping up puz.py is:

 0:Q:clubs:; 1:A:spades:; 2:2:hearts:; 3:K:clubs:; 4:10:hearts:; 5:10:clubs:; 6:A:hearts:; 7:7:diamonds:; 8:Q:hearts:; 9:A:clubs:;10:10:spades:;11:K:hearts:;12:J:hearts:;

Maybe the third colon in each is a problem.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the unidecode stripped the colons.

Got latinize: Q:clubs:
Got unidecode: Qclubs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, I see now your new latinize is introducing the colons in place of the non-ascii characters:

Got raw: Q♣

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to your opinion about what we should do in this example. The "Qclubs" translation wasn't great either, but it avoided the coincidental introduction of colons to the otherwise colon-free answer.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all the thoughtful feedback!

Something else to consider about the colons situation is whether we should strip all colons from the result of anyascii or check first to make sure there aren't intentional colons in the text.

Yes, the NYT supports the notion of multiple correct answers, and for example in this puzzle there is a pretty exhaustive list of "correct" answers for each of these squares. For example:

      [
        "Q\u2663",
        "QUEEN",
        "QUEENOFCLUBS",
        "Q",
        "CLUB",
        "C",
        "QUEEN/CLUB",
        "CLUB/QUEEN",
        "Q/C",
        "C/Q",
        "QC",
        "CQ"
      ],

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, for my concern about there being colons in the source text, that's easily addressable since you've already got it going one code point at a time. We can simply not pass the character to anyascii if it happens to be a single colon character, and strip any colons from the anyascii result. I think I'll work on that later and amend the PR with that as at least a short-term fix to replicate unidecode behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaand ... no need, because the c < 0xFF test already includes the colon character. OK, easy-peasy.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I made one more push with a change to strip all colons from anyascii, at least as a stop-gap fix to hopefully get the tests passing. I think the PR could be considered as-is even while @thisisparker may want to mull the other questions about how to better handle non-ASCII answer text.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are maybe some consistency concerns with unconditionally stripping the colons from anyascii, because elsewhere in the code emojis are converted to text with emoji.demojize, which uses this approach. So e.g. emoji.demojize("🏳️‍⚧️") returns :transgender_flag:. If we change the behavior of anyascii, it will no longer match this.

I do agree it's worth waiting at this point for Parker to weigh in on how best to handle representing rebus text.

@thisisparker
Copy link
Copy Markdown
Owner

Looking at the rebus thing in particular: I think we should not be applying any transformation to grid fill, and rebus entries count as grid fill. (I say that even though I am the one that introduced that transformation in #172.) I think for NYT the best behavior would probably be to test the valid answers and return the first one that doesn't break .puz encoding — and even in a future scenario where we can count on UTF-8, I don't think we should pass emoji in that field.

This is all kind of no-win, because entering rebuses is iffy in most circumstances and this is a lossier operation than I like to do on these, but I don't have any better ideas.

@danielpunkass
Copy link
Copy Markdown
Author

@thisisparker Thanks for chiming in. How are you feeling about the PR on the whole, as far as considering taking the leap and merging into main so that the code base is in fact compliant with its license?

The handling of the non-ascii characters might be something to handle separately, if at all. The state of this PR as it stands should pass all tests and behave comparably to the main branch. It would be great to have the MIT license locked in, especially because keeping GPL in the code base might encourage somebody to add more GPL code in the future.

@afontenot
Copy link
Copy Markdown
Contributor

FWIW I don't think there's a license compliance issue with the code as is. MIT licensed software is allowed to use a GPL licensed dependency. For example, if I want to extract some part of xword-dl and use it in my proprietary crossword software, I can do that (so long as I comply with the terms of the MIT license). The issue with having a GPL dependency is that there's no way for people to convey a functional version of xword-dl under the terms of the MIT license. Shipping a fully functional version of xword-dl would require including the GPL dependency, meaning that the resulting combined work would be covered by the GPL.

So it would be accurate to say that the xword-dl code is MIT licensed, but distributions of the software (e.g. by Debian or Arch Linux) happen under the terms of the GPL because they are tainted by the dependency.

GNU / FSF cover this in some detail here: https://www.gnu.org/licenses/gpl-faq.html#WhatDoesCompatMean

@danielpunkass
Copy link
Copy Markdown
Author

That is indeed a fair distinction, but I think that generally people who are browsing projects and see a permissive license like MIT, they assume they can use the project as-is under those terms. Right now the project can only be used as a whole under the GPL license, so the license tag is misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants