-
Notifications
You must be signed in to change notification settings - Fork 124
Difference between abundance based and distance based greedy clustering (AGC vs. DGC)
AGC (abundance-based greedy clustering) works by assigning a new sequence to the most abundant centroid when several centroids exists within the given similarity threshold (e.g. 97%). DGC (distance-based greedy clustering) works by assigning a new sequence to the closest (most similar) centroid when several centroids exists within the given similarity threshold (e.g. 97%).
For more details about AGC vs. DGC please see this paper by Schloss.
AGC can be turned on in VSEARCH by specifying the --sizeorder
option, while DGC is the default. However, AGC only works when the --maxaccepts
options is specified with an argument larger than 1. VSEARCH uses heuristics to find the approximately most similar sequences first and then considers a number of them in detail (as many as specified with --maxaccepts
). Among those accepted sequences, the most abundant centroid is chosen if --sizeorder
is turned on. Due to the heuristic nature of the methods, the algorithm cannot guarantee to make the optimal choice.
The --sizeorder
option only works with the clustering commands (--cluster_fast
, --cluster_smallmem
and --cluster_size
), and no other command.