Skip to content

Commit 27e1940

Browse files
committed
adds a url filter
1 parent f71ea4f commit 27e1940

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

lib/arachnid.rb

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ def initialize(urls, options = {})
2323
def crawl(options = {})
2424
threads = options[:threads] || 1
2525
max_urls = options[:max_urls]
26+
filter = options[:filter]
2627

2728
@hydra = Typhoeus::Hydra.new(:max_concurrency => threads)
2829
@global_visited = BloomFilter::Native.new(:size => 1000000, :hashes => 5, :seed => 1, :bucket => 8, :raise => false)
@@ -40,6 +41,10 @@ def crawl(options = {})
4041
break
4142
end
4243

44+
if filter
45+
next unless filter.call(q)
46+
end
47+
4348
@global_visited.insert(q)
4449
puts "Processing link: #{q}" if @debug
4550

0 commit comments

Comments
 (0)