Introduction

This document defines a strict serialization of an RDF graph. The output octet sequence is always the same for a given graph.

Key words

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Representation

The graph is represented an an Unicode character string encoded with UTF-16LE [RFC2781].

Integers

When appending an integer to a Unicode string:

  1. Encode the integer into its binary representation as an array of 8-bit-long bytes in descending order of significance (big endian).
  2. Skip all zero bytes at the beginning of the array.
  3. For each remaining byte in the array, append a character to the string, Value of the code point is equal to that of the byte.

RDF statements

An RDF statement is represented as a tuple of four strings: a subject IRI, a predicate IRI, a datatype IRI and a literal value.

Object IRIs are represented as literal values with an empty datatype IRI.

Language-tagged literals or those with no explicit datatype IRI are given the type http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral. An U+0040 COMMERCIAL AT is appended to the value, followed by the language tag, in any. [RDF-LITERAL]

Serialization algorithm

Graph serialization algorithm is as follows. Implementations are encouraged to optimize it.

  1. Let output be an empty Unicode string.
  2. Let terms be an empty list of Unicode strings.
  3. For each RDF statement in the RDF graph:
    1. Insert the subject IRI into terms.
    2. Insert the predicate IRI into terms.
    3. Insert the datatype IRI into terms.
    4. Insert the literal value into terms.
  4. Sort terms in ascending order by code points.
  5. Remove all duplicates in terms.
  6. Iterate terms and for each string:
    1. Duplicate all U+0000 code points in the string.
    2. Append the string to output.
    3. Append a U+0000 NULL to output.
    4. Append a U+FFF0 code point to output.
  7. Let idx be an integer.
  8. Let subject be an empty Unicode string.
  9. Let predicate be an empty Unicode string.
  10. Let datatype be empty Unicode string.
  11. Let list be a list of all RDF statements in the RDF graph.
  12. Execute a multilevel sort on list, in ascending order by code points, first on subject IRIs, then on predicate IRIs, then on datatype IRIs and finally on literal values.
  13. Remove all duplicates in list.
  14. Iterate list and for each statement:
    1. If subject is not equal to the subject IRI:
      1. Set subject to the subject IRI.
      2. Set predicate to the empty string.
      3. Set datatype to the empty string.
      4. Set idx to 0.
      5. Iterate terms and for each string:
        1. If subject is equal to the string, stop iterating.
        2. Increase idx by 1.
      6. Append a U+FFF1 code point to output.
      7. Append idx to output.
    2. If predicate is not equal to the predicate IRI:
      1. Set predicate to the predicate IRI.
      2. Set datatype to the empty string.
      3. Set idx to 0.
      4. Iterate terms and for each string:
        1. If predicate is equal to the string, stop iterating.
        2. Increase idx by 1.
      5. Append a U+FFF2 code point output.
      6. Append idx to output.
    3. If datatype is not equal to the datatype IRI:
      1. Let datatype be the datatype IRI.
      2. Set idx to 0.
      3. Iterate terms and for each string:
        1. If datatype is equal to the string, stop iterating.
        2. Increase idx by 1.
      4. Append a U+FFF3 code point to output.
      5. Append idx to output.
    4. Append a U+FFF4 code point to output.
    5. Set idx to 0.
    6. Iterate terms and for each string:
      1. If the literal value is equal to the string, stop iterating.
      2. Increase idx by 1.
    7. Append idx to output.
  15. Append a U+FFF5 code point to output.
  16. Return output.

Media type

The media type is application/prs.inumi.rdg-graph.

Security considerations

None.

IANA considerations

To be written.