Code für das Text Engineering Seminar (siehe Seminarplan )
Inhalt | Ressourcen/Dependencies | Literatur | |
basic | Korpus, Lineare Suche, Term-Dokument-Matrix | Shakespeare | IIR Kap. 1 |
boole | Invertierter Index, Listen-Intersection, Vorverarbeitung, Positional Index, PositionalIntersect | IIR Kap. 1 + 2 | |
ranked | Ranked Retrieval: Termgewichtung, Vector Space Model | IIR Kap. 6 + 7 | |
evaluation | Evaluation: Precision, Recall, F-Maß | IIR Kap. 8 | |
lucene | Lucene: Indexer und Searcher | lucene-core, lucene-queryparser, lucene-analyzers-common (5.1.0.) | Lucene in Action |
web | Crawler, WebDocument | commons-io, nekohtml, jrobotx | IIR Kap. 19 + 20 |
Inhalt | Ressourcen/Dependencies | Literatur | |
document | Document, Topics, TermIndex, FeatureVector | ||
corpus | Korpus, DB, DocumentIndex, Crawler | db4o, crawler (siehe package ir.web ) | |
classification | TextClassifier, Naive Bayes | IIR 13 | |
classification.lucene | Textkategorisierung mit Lucene, Indexierung und Suche | lucene-classification (5.1.0) | IIR Kap. 13-14, Lucene in Action |
classification.weka | Adapter für Weka-classifier, Indexierung und Suche | weka-dev (3.7.12) | IIR Kap. 13-15, Data Mining |