Introduction

This document defines a strict serialization of an RDF graph. The output octet sequence is always the same for a given graph.

Key words

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Representation

The graph representation encodes three distinct elements, in the order given, as a sequence of octets:

  1. the list of node IRIs,
  2. the list of literals,
  3. the list of statements (the graph).

Literals and IRIs that are longer than 65535 characters are considered insanely long and are not supported.

Language-tagged literals

All plain literals (literal values without a datatype IRI) MUST be converted to their corresponding typed literals of datatype http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral prior to the encoding of the graph. [RDF-LITERAL]

Integers

All integers and unsigned and are encoded as sequences of octets, with the least significant octet encoded first, most significant octet encoded last (i.e. little endian).

Encoding an integer with width w means to encode the integer using exactly w octets.

List of node IRIs

An IRI is a tuple of 5 (five) elements:

  • scheme: a character string,
  • authority: a character string,
  • path: a list of character strings,
  • query: a character string,
  • fragment: a character string.

The list of node IRIs is encoded as follows:

  1. Let output be the output octet sequence.
  2. Let list be a list of all distinct IRIs within the graph.
  3. Let base be an empty URI.
  4. Let rref be an empty URI.
  5. Sort list in ascending order by code point.
  6. For each iri in list:
    1. If base.scheme is not equal to iri.scheme:
      1. Set rref to iri.
      Otherwise, if base.authority is not equal to iri.authority:
      1. Set rref to iri.
      2. Set rref.scheme to an empty string.
      Otherwise, if base.path is not equal to iri.path:
      1. Let i be an index into a list.
      2. Set rref.scheme to an empty string.
      3. Set rref.authority to an empty string.
      4. Set rref.path to an empty string.
      5. Set i to 0 (the first element index).
      6. While i is less than the length of iri.path:
        1. If i is less than the length of base.path:
          1. If iri.path[i] does not equal base.path[i]:
            1. Append iri.path[i] to rref.path.
            2. Increase i by 1 (one).
            3. Break out of the loop.
          Otherwise, append iri.path[i] to rref.path.
        2. Increase i by 1 (one).
      7. Let j be a copy of i.
      8. While j is less than the length of base.path:
        1. Prepend .. to rref.path.
        2. Increase i by 1 (one).
      9. While i is less than the length of iri.path:
        1. Append iri.path[i] to rref.path.
        2. Increase i by 1 (one).
      10. If rref.path is longer than iri.path, set rref.path to iri.path.
      11. Set rref.query to iri.query.
      12. Set rref.fragment to iri.fragment.
      Otherwise, if base.query is not equal to iri.query:
      1. Set rref to iri.
      2. Set rref.scheme to an empty string.
      3. Set rref.authority to an empty string.
      4. Set rref.path to an empty string.
      Otherwise:
      1. Set rref to iri.
      2. Set rref.scheme to an empty string.
      3. Set rref.authority to an empty string.
      4. Set rref.path to an empty string.
      5. Set rref.query to an empty string.
    2. Set base to iri.
    3. Let l be the length, in characters, of rref.
    4. If l is more than or equal to 65535, raise an error.
    5. Encode l with width 2 and append the result to output.
    6. Encode rref using the UTF-8 encoding [RFC2277] and append the result to output.
  7. Append 0x00 to output.
  8. Append 0x00 to output.
  9. Return output.

List of literals

The list of literals is encoded as follows:

  1. Let output be the output octet sequence.
  2. Let list be a list of all distinct literals within the graph.
  3. Sort list in ascending order by code point.
  4. For each value in list:
    1. Let l be the length, in characters, of value.
    2. If l is more than or equal to 65535, raise an error.
    3. Encode l with width 2 and append the result to output.
    4. Encode value using the UTF-16LE encoding [RFC2781] and append the result to output.
  5. Append 0x01 to output.
  6. Append 0x00 to output.
  7. Append 0xFF to output.
  8. Append 0xDF to output.
  9. Return output.

List of statements

The list of statements is encoded as a sequence of references to the elements of the two lists encoded before this one. The first element is referenced by the value 0 (zero), the second by 1, the third by 2 and so on.

The width of an integer referencing an element depends on the amount of elements in a given list. It is the smallest amount of octets necessary to encode a reference to the last element of the list. The width MUST be a power of 2 (two).

A statement is a tuple of 5 (five) elements:

  • s: subject IRI,
  • p: predicate IRI,
  • o: object IRI,
  • d: datatype IRI,
  • v: literal value.

The list of statements is encoded as follows:

  1. Let output be the output octet sequence.
  2. Let list be the list of all statements in the encoded graph.
  3. Let wu be the width of a reference to an element of the IRIs list iris.
  4. Let wl be the width of a reference to an element of the literals list vals.
  5. Let ps be an empty IRI.
  6. Let pp be an empty IRI.
  7. Let po be an empty IRI.
  8. Let pd be an empty IRI.
  9. Sort list by v in ascending order by code point.
  10. Sort list by d in ascending order by code point.
  11. Sort list by o in ascending order by code point.
  12. Sort list by p in ascending order by code point.
  13. Sort list by s in ascending order by code point.
  14. For each triple in list:
    1. If triple.s is not equal to ps:
      1. Set ps to triple.s.
      2. Set pp to an empty IRI.
      3. Set po to an empty IRI.
      4. Set pd to an empty IRI.
      5. Append 0xAA to output.
      6. Encode a reference to ps in iris with width wu and append the result to output.
    2. If triple.p is not equal to pp:
      1. Set pp to triple.p.
      2. Set po to an empty IRI.
      3. Set pd to an empty IRI.
      4. Append 0xB3 to output.
      5. Encode a reference to pp in iris with width wu and append the result to output.
    3. If triple.o is not equal to po:
      1. Set po to triple.o.
      2. Append 0x96 to output.
      3. Encode a reference to po in iris with width wu and append the result to output.
    4. Otherwise, if triple.d is not equal to pd:
      1. Set pd to triple.d.
      2. Append 0x55 to output.
      3. Encode a reference to pd in iris with width wu and append the result to output.
      4. Encode a reference to triple.v in vals with width wl and append the result to output.
  15. Return output.

Internet Media Type

The media type is application/prs.inumi.rdg-graph.

There are no parameters.