Skip to content
tbouttaz edited this page Apr 24, 2012 · 6 revisions

The different steps involved in the generation of text are:

  1. Build the Semantic Graph Transformer (SGT) representing a list of RDF statements (AutomaticGenerator.buildGraphFromSesameRepo())
  2. Create a new BrowsingGenerator from the SGT, sesameReader and ontologyReader
  3. BrowsingGenerator.getSurfaceText(): Create a ContentPlanner which will lexicalise() each SGNode of the SGT by using the Language Specification files (see section about Lexicon). It returns a list of DependencyTreeTransformer that each represents one sentence in the text.
  4. Create a SurfaceRealiser and realize each <DependencyTreeTransformer> by using the simpleNLG package. This returns a List of AnchorString that together form the FeedbackText.

Therefore we have 2 different types of graphs:

  1. A Semantic Graph that corresponds to the RDF statements. Each predicate of those statement are represented by an edge in the graph. Each subject and object is represented by a node.
  2. A Dependency Tree graph that corresponds to the actual words that together form the generated text. This graph is generated by translating each elements of the Semantic Graph with its corresponding Language Specification file.

Lexicon

The text is produced by converting triples contained in a RDF repository by using the appropriate language specification files. Those language specifications are XML files describing how to render the text corresponding to a property (e.g. syntactic category, source node, target node, verb tense). For example with the statement: Paper’s ID ; hasTitle ; "Paper’s title" the language specification file corresponding to the hasTitle property will specify that this information must be rendered as: The title of this paper is "Paper’s title".

However for statements that link 2 resources together (e.g. Paper’s Id ; hasAuthor ; Person’s ID ), we need a way of defining how to refer to the object of those triples, as we can’t just use the related resource’s ID. This is defined in the Class Language Specification files, which are XML files defining which property must be used to uniquely identify an instance of a class. For example the specification file of the Person class defines that the properties firstName and surname must be used together to uniquely identify an instance of that class. (Those files are also used to generate the titles of the textual descriptions)

Both those language specifications (properties and classes) can be generated with the Language Specification Creator, a standalone application that reads every class and property defined in an ontology file, and assists the user in the creation of language specifications. The path to those language specification files must be defined in the nlgService.properties file.

Aggregation

When generating the description of a resource, this resource might have many times the same property. E.g. a project can have 20 different members (i.e. 20 hasMember properties). Instead of listing all of those properties individually (e.g. “the project’s members are Thomas, Ed, Kate…”), the NLG service offers the possibility to aggregate similar properties (e.g. “the project’s members are 20 members”), and allow the user to expand this aggregation by following an anchor. To set the maximum number of similar properties allowed to be displayed before they are aggregated: TextTypesGenerator.setAggregationThreshold(int)

Key Java Classes

For a more in-depth analyse of the key java classes used by the NLG service, see this page.

Clone this wiki locally