Blocking fetcher thread

Hi @jnioche !

Thanks again for all your work ! Now, let me expose you our fetcher thread issue. 

## Resume
Our cluster have 6 worker nodes. We are fetching more than 3 million URLs per day with our topology. It is deployed on 16 worker slots and use 16 fetchers, one by worker slot.
## OkClient.HttpProtocol
The worst issue was spotted with the OkClient.HttpProtocol. Sometime, one of the worker nodes step up to 100% CPU usage. For example, the worker 5 in this case:

![image](https://user-images.githubusercontent.com/7784455/188117872-c09ada78-7bdc-45f5-8dcf-39d0e7d95061.png)

On StromCrawler board, we can see the fetcher count increase up to 50 (our fetcher limit) :

![image](https://user-images.githubusercontent.com/7784455/188117950-9820146d-81b4-40d1-a059-fcbf30294763.png)

Worst, in another case, all the topologies are impacted : 

![image](https://user-images.githubusercontent.com/7784455/188118026-67d43208-1884-4174-a020-04552c090028.png)

All fetchers are impacted, and the topology is running slowly. The only way to fix the problem, is to kill and redeploy the topology. On kill phase, the log confirms some blocking thread:

> 2022-05-30 06:37:06.557 O.A.S.D.W.WORKER SHUTDOWNHOOK-SHUTDOWNFUNC [INFO] SHUTTING DOWN EXECUTORS
...
> 2022-05-30 06:37:07.028 O.A.S.E.EXECUTORSHUTDOWN SHUTDOWNHOOK-SHUTDOWNFUNC [INFO] SHUTTING DOWN EXECUTOR FETCHER:[30, 30]
> 2022-05-30 06:37:07.077 C.D.S.B.FETCHERBOLT THREAD-21-FETCHER-EXECUTOR[30, 30] [ERROR] INTERRUPTED EXCEPTION CAUGHT IN EXECUTE METHOD
> 2022-05-30 06:37:07.077 C.D.S.B.FETCHERBOLT THREAD-21-FETCHER-EXECUTOR[30, 30] [ERROR] INTERRUPTED EXCEPTION CAUGHT IN EXECUTE METHOD
> 2022-05-30 06:37:07.077 C.D.S.B.FETCHERBOLT THREAD-21-FETCHER-EXECUTOR[30, 30] [ERROR] INTERRUPTED EXCEPTION CAUGHT IN EXECUTE METHOD
> 2022-05-30 06:37:07.077 C.D.S.B.FETCHERBOLT THREAD-21-FETCHER-EXECUTOR[30, 30] [ERROR] INTERRUPTED EXCEPTION CAUGHT IN EXECUTE METHOD

## HttpClient.HttpProtocol
We had tried to change the protocol to fix this issue. The CPU has never reach again 100%. But periodically, some fetcher threads are not released. 

![image](https://user-images.githubusercontent.com/7784455/188118614-718c2ebd-860c-4529-b674-31b349574487.png)


After some days, those “zombie” threads increase. We are often redeploying the topology (for functional update) and obviously, a new deployment reset thread count.

For now, the issue is less critical then the OkClient one, but we are trying to understand. Do you have any ideas or similar case?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blocking fetcher thread #996

Resume

OkClient.HttpProtocol

HttpClient.HttpProtocol

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Blocking fetcher thread #996

Description

Resume

OkClient.HttpProtocol

HttpClient.HttpProtocol

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions