Hi there,
MASH is good for very similar sequences but might not be the best for long evolutionary distances. I saw blastp is the alternative but it's currently encoded to run one sequence at a time in a loop. Commands from BLAST+ don't scale well past 2-4 threads, so it would be better to make the function multithreaded. Alternatively, this could be done through GNU Parallel or DIAMOND as they should be easy in-place replacement and they scale pretty well. Another option could be to use MMSeqs2.
What do you think?