Skip to content
gromgull edited this page Apr 26, 2013 · 9 revisions

Between version 3.4.0 and 4.0, a backwards incompatible change was made in RDFLib to how datatyped literals are handled.

First of all, this all looks very complicated, but rest assured, the changes are actually quite subtle and you are unlikely to notice unless you do something specialised.

Hopefully, the changes will not affect very many users of RDFLib, but this page collects the details of what was changed and has a list of any changes required in code using RDFLib.

Why were backwards incompatible changes introduced?

The by far biggest problem was that in the pre 4.0 handling of Datatyped Literals, __hash__ and __eq___ were not consistent, i.e.

>>> Literal(2.5) == Literal("2.50",datatype=XSD.float)
True
>>> hash(Literal(2.5)) == hash(Literal("2.50",datatype=XSD.float))
False

This is very bad, and would lead to literals not working probably with data-structures such as sets and dicts.

Also, the old way tried to support equality and comparisons between typed Literals and python objects directly, which was convenient in some cases, but inconsistent and confusing in others.

What is new?

All comparisons methods for literals have been reworking to be in line with the SPARQL 1.1 spec, which in turn builds in XPath and XML Schema. The nitty-gritty details:

  • Node equality according to __eq__ / == are done according to the SPARQL sameTerm function, which refers to Section 6.5.1 of the RDF Abstract Syntax. __hash___ is naturally consistent with equals.

  • A new method Node.eq does comparison according to: SPARQL RDF-Term equal (=) - i.e. value-based comparison, as defined in the RDF Abstract Syntax 6.5.2

  • Relative comparisons (>, <, >=, <= operators / __lt__, __gt__, __ge__, __le__ methods and therefore sort-ordering of Nodes is done according to SPARQL ORDER BY and <, > operators. The sorting is also done in value space, so all numerically typed literals will sort accordingly, otherwise literals are sorted by language tag, or by datatype URI. Nodes in general are sorted as None, BNode, Variable, URIRef, Literal

  • Datatyped literals are optionally normalised at creation time, i.e. if a lexical form corresponds to a valid value in the value-space for a datatype, this value is again serialised to a string and this serialisation is used as the lexical form. Easier explained through an example:

>>> Literal("0000001", datatype=XSD.integer)
Literal("1", datatype=XSD.integer)
>>> Literal("0.00000", datatype=XSD.double)
Literal("0.0", datatype=XSD.double)

The flag is either set globally as rdflib.NORMALIZE_LITERALS or as a keyword argument to Literal.__new__. Normalization is enabled by default.

  • Only semi-related, the Literal class also defines operators for arithmetic, +, -, /, *, ~, ... . This now return Literals, rather than whatever Python feels like, allowing us to do:
age=graph.value(bob, myschema.age)
graph.set(bob, myschema.age, age+1)

What do I have to watch out for?

Most things now work in a fairly sane and sensible way, if you do not have existing stores/intermediate stored sorted lists, or hash-dependent something-or-other, you should be good to go.

Literals no longer compare equal across data-types with ==

i.e.

>>> Literal(2, datatype=XSD.int) == Literal(2, datatype=XSD.float)
False

But a new method eq on all Nodes has been introduced, which does semantic equality checking, i.e.:

>>> Literal(2, datatype=XSD.int).eq(Literal(2, datatype=XSD.float))
True

The eq method is still limited to what data-types map to the same value space, i.e. all numeric types map to numbers and will compare, xsd:string and plain literals both map to strings and compare fine, but:

>>> Literal(2, datatype=XSD.int).eq(Literal('2'))
False

Your literals will be normalised according to datatype

If you care about the exact lexical representation of a literal, and not just the value. Either set rdflib.NORMALIZE_LITERALS to False before creating your literal, or pass normalize=False to the Literal constructor

Ordering of literals and nodes has changed

Comparing literals with <, >, <=, >= now work same as in SPARQL filter expressions.

Greater-than/less-than ordering comparisons are also done in value space, when compatible datatypes are used. Incompatible datatypes are ordered by DT, or by lang-tag. For other nodes the ordering is None < BNode < URIRef < Literal

Any comparison with non-rdflib Node are "NotImplemented" In PY2.X some stable order will be made up by python. In PY3 this is an error.

Custom mapping of datatypes to python objects

You can add new mappings of datatype URIs to python objects using the rdflib.term.bind method. This also allows you to specify a constructor for constructing objects from the lexical string representation, and a serialisation method for generating a lexical string representation from an object.

If you encounter any other problems with the changes, please extend this list and/or open an issue!