Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect logout URLs automatically #13

Open
kmike opened this issue Mar 17, 2016 · 4 comments
Open

detect logout URLs automatically #13

kmike opened this issue Mar 17, 2016 · 4 comments

Comments

@kmike
Copy link
Contributor

kmike commented Mar 17, 2016

We should prevent logging out caused by following logout links.

@lopuhin
Copy link
Contributor

lopuhin commented Mar 18, 2016

There is some logout detection here

any(self.auth_cookies.get(name) and m.value == ''
but it is not quite accurate: for example it does not work for the phpbb3, where on logout one of the cookies is set to 1, not to an empty string (in fact it works there). If we detect logout, we try to log in again, and will avoid this link in the future.

There are two more problems with current logout mechanics:

  • the logout can happen due to cookie expiration, not due to following a logout link, but we will still mark this link as a logout link and will avoid revisiting it. This can be solved by requiring at least two logout to mark link as leading to logout.
  • the logout link could be unique each time (for example ?action=logout&sid=<session id here>) - in this case we would be logging out periodically, which will slow the crawl (we will be also retrying all requests that were in-fly or scheduled in twisted when the logout happened). I don't known how to solve this apart from using some ML.

Did you have something more specific in mind here @kmike ?

@kmike
Copy link
Contributor Author

kmike commented Mar 18, 2016

Nothing more specific; I was thinking about howto detect logout links before following them. Maybe even just 'logout' in URL or link text covers 99% cases.

lopuhin added a commit that referenced this issue Apr 14, 2016
For now, links with "logout" in url and "logout" or "log out" in text
will be skipped. This detection is off when autologin is disabled.
@lopuhin
Copy link
Contributor

lopuhin commented Apr 14, 2016

Hm, just though that it would be even better if this support was off when the site is skipped, and autologin is effectively disabled, although AUTOLOGIN_ENABLED is True.

@lopuhin
Copy link
Contributor

lopuhin commented Apr 21, 2016

Hm, just though that it would be even better if this support was off when the site is skipped, and autologin is effectively disabled, although AUTOLOGIN_ENABLED is True.

Done in 794526f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants