posextract offers grammatical information extraction methods designed for the analysis of historical and contemporary textual corpora. It traverses the syntactic dependency relations between parts-of-speech and returns sequences of words that share a grammatical relationship. See our article for more. You can also download posextract for pypi with pip.
extract_triples
to extract subject-verb-object (SVO) and subject-verb-adjective complement (SVA) triplesextract_adj_noun_pairs
to extract adjective-noun pairsextract_subj_verb_pairs
to extract subject-verb pairs
Required Paramters:
input
can be the name of a csv file or an input stringoutput
name of the output file
Optional Paramters:
--data_column
specify the column to extract triples from.--id_column
specify a unique ID field if csv file is given.--file-delimiter
specify comma, pipe, or tab. Default is comma.--post-combine-adj
combine triples (adjective predicate with object)--add-auxiliary
extract future and past tense triples.--prep-phrase
extract the . Default set to false.--no-compound-noun
Extract just the subject or object (e.g. "Indian Government" is extracted as just "Government").--lemma
specify whether to lemmatize parts-of-speech. Default is non-lemmatized.--verbose
print
Extract grammatical triples.
from posextract import grammatical_triples
triples = grammatical_triples.extract(['Landlords may exercise oppression.', 'The soldiers were ill.'])
for triple in triples:
print(triple)
# Output: Landlords exercise oppression, soldiers were ill
Extract grammatical triples using different options from default:
from posextract.util import TripleExtractorOptions
triples = grammatical_triples.extract(sent, TripleExtractorOptions(prep_phrase = True))
Or extract adjectives and the nouns they modify.
from posextract import adj_noun_pairs
adj_noun = adj_noun_pairs.extract()
Or extract subjects and their verbs.
from posextract import subj_verb_pairs
subj_verb = subj_verb_pairs.extract()
posextract can extract grammatical triples from text:
python -m posextract.extract_triples "Landlords may exercise oppression." output.csv
# Output: Landlords exercise oppression
posextract can extract SVO/SVA relationships separately or it can combine the adjective as part of a SVO triple:
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj
# Output: soldiers were terminally, soldiers were ill
python -m posextract.extract_triples "The soldiers were terminally ill." output.csv --post-combine-adj
# Output: soldiers were terminally ill
If provided a .csv file:
python -m posextract.extract_triples --data_column sentence --id_column sentence_id input.csv output.csv
... see our Wiki: