You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 17, 2018. It is now read-only.
While running webindex on EC2 I have noticed the link parsing done by the load task is very CPU intensive. This is usually the bottleneck for loading data when running one load task per node.
For example on a 20 node m3.xlarge EC2 cluster with 20 load task running, the maximum load rate is around 1000 pages/sec. As load increases on the system from having more data (caused by compactions, etc), this takes more CPU and causes the load rate to drop.
The text was updated successfully, but these errors were encountered:
Could create a stand alone test to measure performance of this code. I suspect its slow, but it may not be. Its hard to tell on a cluster with lots of other things going.
Talked w/ @mikewalch offline, he mentioned that the load task filters out pages that only have links within the domain. He thinks alot of pages may be filtered. The task could possibly be spending time on this.
While running webindex on EC2 I have noticed the link parsing done by the load task is very CPU intensive. This is usually the bottleneck for loading data when running one load task per node.
For example on a 20 node m3.xlarge EC2 cluster with 20 load task running, the maximum load rate is around 1000 pages/sec. As load increases on the system from having more data (caused by compactions, etc), this takes more CPU and causes the load rate to drop.
The text was updated successfully, but these errors were encountered: