We need deduplication to save storage in repeated crawls of the same job based on a dynamic created index of the previous crawl