We assess the similarity between LLM outputs and input graph content using character-level (i.e., Levenshtein distance) and word-level (i.e., TF-IDF) metrics. For PostgreSQL anomalies, we obtain similarity scores of 0.10 and 0.37, showing that the outputs are not merely copied from the knowledge graph.
| Method | PostgreSQL | Oracle | ||||||
|---|---|---|---|---|---|---|---|---|
| difflib | Levenshtein | Jaccard | TF-IDF | difflib | Levenshtein | Jaccard | TF-IDF | |
| DBAIOps (DeepSeek-R1 32B) |
0.04 | 0.07 | 0.20 | 0.30 | 0.04 | 0.05 | 0.15 | 0.39 |
| DBAIOps (DeepSeek-R1 671B) |
0.04 | 0.10 | 0.24 | 0.37 | 0.05 | 0.09 | 0.29 | 0.50 |