Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

antibot: how ? #2

Open
dalf opened this issue Jul 6, 2019 · 4 comments
Open

antibot: how ? #2

dalf opened this issue Jul 6, 2019 · 4 comments

Comments

@dalf dalf changed the title hapoxy ? antibot: how ? Jul 6, 2019
@unixfox
Copy link
Member

unixfox commented Jul 6, 2019

From my experience, apart from my project all these resources are not really effective because when running a Searx instance that use Google you have to very quickly identify the bots (before the bot do 2 requests at maximum) because if you fail to do so the instance will get quickly blocked by Google.
Moreover, restricting the number of requests per seconds gives a bad experience for the normal users because they can't navigate quickly between the categories or browsing quickly the next pages results.

@unixfox
Copy link
Member

unixfox commented Jul 9, 2019

I found some tools that can be used to defend against bots:

  • Tempesta FW: A firewall that support sticky cookie which is a method to filter bots that doesn't support cookies. The downside is that this software is really complicated to install if you aren't using Debian 9.
  • testcookie an NGINX module: Works similarly as the sticky cookie of Tempesta but is not supported anymore and rely on using only nginx (probably an older version of nginx).
  • net-Shield: A reverse proxy that acts as an anti ddos for HTTP(S) requests. Details about how it works here, in summary it uses the blacklist of firehol.
  • Nginx L7 DDoS Protection: Use the same method as tempesta and testcookie for protection against bots. Rely on nginx but is easy to install.

@dalf
Copy link
Contributor Author

dalf commented Jul 12, 2019

Thank you !
Some feed backs

Tempesta FW

use cases :

net-Shield

https://github.com/fnzv/net-Shield/blob/master/shield.go :
it blocks IP from http://iplists.firehol.org/ (see https://github.com/firehol/blocklist-ipsets )

it set up some iptables rules :

  • iptables -A INPUT -m set --match-set ratelimit src -m hashlimit --hashlimit 25/sec --hashlimit-name ratelimithash -j DROP
  • iptables -A INPUT -m set --match-set block src -j DROP
  • iptables -A INPUT -p tcp ! --syn -m state --state NEW -j DROP
  • iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP
  • iptables -A INPUT -p icmp -m icmp --icmp-type timestamp-request -j DROP && iptables -A INPUT -p icmp -m limit --limit 1/second -j ACCEPT
  • etc...

Not sure it will help.

Nginx L7 DDoS Protection ==

Way too much dependencies : https://github.com/theraw/The-World-Is-Yours/blob/master/install

Some of them :

ModSecurity can be interresting, not sure :

Note

I think that the one way is TLS fingerprint : whatever the sent HTTP headers, the cipher suites are related to the client.
Look at https://browserleaks.com/ssl : you will have a different cipher suites using Curl or Firefox, even you "copy URL as Curl command".

Of course it is possible to tweak this, it is an additional safety net.

This is actually the way Caddy detects MITM : https://github.com/caddyserver/caddy/blob/master/caddyhttp/httpserver/mitm.go

Basically, there is a problem if the request comes from Firefox but it doesn't :

About Caddy, see :

About Nginx, see :


Still about Nginx, it is possible to execute Lua code : https://github.com/openresty/lua-nginx-module#name
Not sure if it could help in a way or another.


Another link :
Protocol for bypassing challenge pages using RSA blind signed tokens draft-protocol-challenge-bypass-00

@unixfox
Copy link
Member

unixfox commented Jul 12, 2019

The issue with TLS fingerprint is it would requires to implement a verification for every browser that support TLS 1.2 and TLS 1.3 which is pretty long task to do. On my searx instance I've a wide variety of browser that use it from some weird Chinese browsers to Google Chrome. Personally it would be an overkill task to do.
Moreover some of the users change their user agent for privacy reasons which if TLS fingerprint is implemented would block their access.

On my antibot-proxy project I deployed two days ago a similar sticky cookie protection that Tempesta FW use and it reduced the amount of bots that reached my searx instance by 70%! I'm not sure how long it will stay like that but I was right the bad bots that do ranking manipulation on the public instances are really badly coded.

I have a loads of ideas to block the bots and I plan to refactor completely my project after my vacation so that it would be usable by the vast majority of searx public instance owners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants