Introduction

When building a database system designed for very long-term use, there arises a need for permanent identifiers for its objects. Usage of provisional identifiers will eventually lead to a situation where a name ceases to refer to the object it has named, effectively making references incorrect until said identifiers are updated. This implies the need for constant database monitoring and maintenance. Users of the system would also need to be informed of any changes made.

This specification defines abstract data objects called ‘resource descriptors’ and an URI scheme for naming them. It also defines an interface for interacting with these objects. Anything else is outside the scope of this document, including mapping said interface to a communication protocol.

Resource descriptors contain knowledge about a topic designated by the URI. They are abstract in the sense that the content of their representation is different depending on the current time and the contacted host. It is neither a specific object nor a network location. You may think of the URI as a precise search term supplied in a query and of the resource descriptor as an answer to that query.

Foreword

The key words ‘MUST,’ ‘MUST NOT,’ ‘REQUIRED,’ ‘SHALL,’ ‘SHALL NOT,’ ‘SHOULD,’ ‘SHOULD NOT,’ ‘RECOMMENDED,’ ‘NOT RECOMMENDED,’ ‘MAY,’ and ‘OPTIONAL’ in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Descriptor identifier

Resource descriptors are identified by the rd URI scheme defined herein.

Syntax

The syntax of these URIs is defined by the following rd-URI ABNF rule. It follows the generic URI syntax defined in RFC3986. The reg-name, segment-nz, query and fragment rules are imported from that document.

rd-URI   = "rd://" rd-auth [ rd-path [ "?" query ] [ "#" fragment ] ]
rd-auth  = reg-name / RDGN
RDGN     = 1*RDGN-blk
RDGN-blk = <24>b32-char
b32-char = ALPHA / "2" / "3" / "4" / "5" / "6" / "7"
rd-path  = 1*( "/" segment-nz )

Scheme

The rd in the scheme stands for ‘resource descriptor.’ It is expected that these identifiers are stored in large amounts. A two-letter abbreviation was chosen in order to save space and to make the computation time of URI comparisions shorter.

Authority

The authority component contains either the canonical Resource-descriptor Graph Number (RDGN) or a registered name for ease of human input.

Resource-descriptor Graph Number (RDGN)

The canonical authority is a Resource-descriptor Graph Number (RDGN). It is a randomly-generated unsigned integer, which identifies a graph (collection) of closely-related resource descriptors.

The maximum value of the RDGN is a variable called its length. The unit of an RDGN length is a 120-bit block.

The initial length is 1 block. The length is increased in steps of blocks, i.e. by 120 bits.

At least one bit of the last block MUST be set. RDGN with all bits cleared is invalid.

These blocks ensure that both base64 and base32 encodings of the binary representation produce strings without any padding. This also leaves one free octet in a 16-octet buffer for use by software, where a last-block marker or the amount of remaining blocks could be stored.

Textual representation of an RDGN is constructed by representing the number as a sequence of octets in ascending order of octet significance. The resulting sequence of octets (its length is a multiple of 15) is then encoded into text using some octet-to-text encoding. Within the URI, RDGNs are encoded with the base32 encoding. [RFC4648]

Note: One block (15 octets) produces 24 base32 characters.

Note: base64 cannot be used because URI authorities are case-insensitive.

RDGN registry

This technology was created for use in a Kueea Network. [KUEEA] It is a peer-to-peer network, in which a node may advertize that it wishes to take on a given network role.

Users generate RDGNs locally on their node when a need for one arises. The node then tries to register the number with a registry. If the number is not marked as in use, it is registered.

A user-controlled node generates the number, not the registry, in order to ensure the number has really been randomly generated. If a remote node would control the generation of numbers, it could present numbers which only appear to be random.

Taking on the role of an RDGN registry means that the node will be contacted by other nodes in order to determine allocation state of an RDGN. The role of a registry is thus to keep a list of allocated numbers. The minimum amount of information on a given RDGN a registry needs to store is a boolean value indicating whether the number has been allocated or not.

There SHOULD be multiple nodes functioning as an RDGN registry, because a given node MAY resign from its role at any time. Registries SHOULD forward a registration request to other known registries as part of the registration process.

Protocol for such a registry is out of scope of this document.

Registries also keep track of the current RDGN length. The length is increased independently by each registry. The length is extended by one block when 1% of the value space becomes allocated. In other words, the second block is added after 2120/100 numbers have been collected, the third after 2240/100 numbers, etc.

It is an error if a node wishes to register a number that is too short. Length of new RDGNs MUST be equal to or more than the current length.

Empty blocks

The RDGN EEEEEEEEEEEEEEEEEEEEEEEEAAAAAAAAAAAAAAAAAAAAAAAA is treated the same as EEEEEEEEEEEEEEEEEEEEEEEE. In other words, all empty blocks are stripped from the end.

Anonymous Graph

RDGN 1745936836749459630212825467061601310 (ANONANONANONANONANONANON in base32) is reserved for the Anonymous Graph.

The Anonymous Graph SHOULD be used in examples.

Implementations MAY define special processing for the Anonymous Graph.

Registered name

The authority component is treated as a registered name if, and only if, the authority component is not a multiple of 24 characters or it contains a character not matched by the b32-char rule.

These names are only for ease of human input. They are never stored.

DNS [RFC1035] name is the only domain of registered names defined.

This document may be updated in the future in order to define additional domains of registered names, although it is believed that such a need will never arise. The authors believe that no other system for storing human-readable, registered names than DNS is required, because any such system would ultimately have exactly the same problems as those identified with DNS. Any human-readable naming system requires a global registry, which must be centrally managed in order to solve disputes over names. Most problems with the public DNS are not with the database itself, but are rather problems with the management (such as lack of trust in it) or with the protocol used for transferring the data over a network.

In order to resolve a domain name to an RDGN:

  1. Issue a query for TXT records associated with the domain name.
  2. Iterate TXT records in the answer section in any order.
  3. Test RDATA for a complete match with the rd-DNS rule.
  4. Extract the values on the first match and return success.
  5. If no matches were found, return failure.
rd-DNS   = %s"RDGN " <RDGN> "@" <URI> ; URI from RFC 3986

The URI is an endpoint URI that the domain owner wishes resolvers to contact first in order to access the graph’s data. It is assumed the domain owner publishes their data there.

Path

The path is a human-readable, case-sensitive name of a resource descriptor.

Although the syntax defines paths as hierarchical, resource descriptors are considered to be nodes of a graph (i.e. not of a tree). There is no concept of parents and children within the namespace. In other words, the existence of /a/b/c does not imply the existence of /a nor the existence of /a/b.

For clarity: Paths MUST NOT end with a U+002F SOLIDUS character (/) and empty path segments are not permitted, i.e. there MUST NOT be any two consecutive U+002F SOLIDUS characters (//) within the path segment.

Each resource descriptor should contain knowledge that is narrow in scope. Descriptors are to form a graph structure that applications traverse. The names ought to be chosen so that the amount of knowledge within one descriptor is concise and can be processed relatively fast. Large amounts of information should be split among mutiple descriptors so that applications do not waste time processing unnecessary data.

For example, if a Compact Disc is to be described, only information about the disc itself should be present and nothing else. Even if there are songs on the disc, a song is not a disc – it should have its own descriptor.

The /index descriptor

Every graph MUST contain at least the descriptor under the path /index.

It is the the first descriptor an application retrieves. It contains information about the classes of the graph and references to other descriptors that exist within the graph.

The list of referenced descriptors MAY not be exhaustive. Only nearby nodes SHOULD be referenced, i.e. the minimal set required to reach all other nodes.

It is RECOMMENDED that other descriptors with the same purpose also use the path segment index, preferably as the last one.

Query

The query component is a serialized list of name-value pairs.

The serialization algorithm is as follows: For each pair:

  1. Encode both the name and value using UTF-8. [RFC3629]
  2. If output is not empty, append an U+0026 AMPERSAND character.
  3. Append the name.
  4. Append an U+003D EQUALS SIGN character.
  5. Append the value.

The space of query parameters is defined separately for the retrieval function and for the submission function.

Normalization

Do the following in order to normalize an rd URI.

  1. If the authority is a registered name, dereference the name and modify the authority component accordingly.
  2. Remove the query component.

URI comparision

When comparing URIs according to this framework:

  • resolve to the full URI if an URI reference;
  • if the scheme is rd, normalization is REQUIRED, otherwise normalization is RECOMMENDED;
  • compare URIs by components, not by the whole string; for example, the URI http://example.com/?# is equal to http://example.com/.

Descriptor definition

descriptor
|-piece-id => RDF-graph
|             |-signer-id => signature
|             |-signer-id => signature
|-piece-id => RDF-graph
              |-signer-id => signature
              |-signer-id => signature

A resource descriptor is a unique mapping of a piece indentifier to a descriptor piece.

A piece identifier is a character string which has the same syntax as the Message-ID field of Internet Messages. [RFC2822]

A descriptor piece is a pair of an RDF graph, and a unique mapping of a (singer) URI to a signature.

An RDF graph is a set of RDF statements. [RDF] The set MUST NOT be empty.

A signature is a tuple of: the time of expiration, a character string identifying the signature scheme, and an opaque data object (sometimes called a blob). The object contains the result of applying the scheme.

Signatures are an assessment by the signer that all of the statements contained within the signed RDF graph are all correct and true.

A piece expires when all of its signatures expire, i.e. all assessments of information truthfullness expire.

Graph requirements

All rd URIs in the graph MUST be in the normalized form.

All ni URIs MUST have an empty authority component.

There MUST NOT be any data URIs in the graph.

Signatures

This document does not define any signature schemes. It only defines how signatures are expressed and what data is signed.

sig-scheme = 1*(ALPHA / DIGIT / "-")
sig-expire = date-time / sig-expkey
sig-expkey = %s"never"

The date-time rule is imported from [RFC3339].

The identifier of a signature scheme MUST match the sig-scheme rule.

The time of expiration is character string. It MUST match the sig-expire rule. It is either a specific date and time or a keyword.

The only defined keyword is never, indicating that the signature never expires.

Data necessary for veryfying a signature is obtained via the signer URI. The signature scheme defines how to utilize the URI. In general, these URIs SHOULD identify a user account or similar object.

The opaque data object contains the result of applying the signature scheme over an output sequence of octets, generated as follows.

  1. Let output be an empty sequence of octets.
  2. Let graph be the RDF graph of the piece.
  3. Append the piece identifier to output.
  4. Append the URI of the signer to output.
  5. Append the time of expiration to output.
  6. Append the scheme identifier to output.
  7. Encode graph into its application/prs.inumi.rdg-graph representation [RDG-GRAPH] and append the result to output.
  8. Return output.

This algorithm permits re-encoding of the RDF graph into another representation without invalidating a signature.

Resolution / Interface

Interface of a resolver has two functions: retrieval and submission.

Both of these function take a desciptor indentifier as input. Parameters are extracted from the query component of the identifier and then the URI is normalized, removing the query component.

Descriptor identifiers without a path component cannot be resolved. The syntax permits such identifiers so that the graph itself may be referenced, but graphs themselves cannot be dereferenced.

Graph data is stored at Resource-descriptor Graph Endpoints. An endpoint is referenced by an endpoint URI, designating a way though which a database can be accessed. Definition of such an endpoint is out of scope of this document.

The rd-graph.home.arpa domain name

This document defines the DNS domain name rd-graph.home.arpa.

It is a name within a residental home network. [RFC8375] Availablility of this name is REQUIRED to comply with this document. It is configured locally per site as an opt-in to the RDG system.

Queries for this name MUST return at least one reference to a host, which implements at least one type of an endpoint with a default URI. If possible, SRV records [RFC2782] should be used.

A default endpoint URI is one constructed from the hostname only.

These hosts may be located within the home network or somewhere else. It might as well be an address of a loopback interface.

Common parameters

This section lists parameters for both retrieval and submission.

The endpoints parameter

The value of the endpoints parameter is a space-separated list of endpoint URIs which designate endpoints that are to be contacted.

This parameter overrides any resolver-specific configuration.

Retrieval

Input is a descriptor identifier.

Output is a data object whose format depends on the format parameter.

The format parameter

The format parameter identifies the format of the representation. The value is either a media type or a URI of the data format.

By default, the value is multipart/prs.inumi.rdg-descriptor. [RDG-MIME]

The signers parameter

The signers parameter is a space-separated list of signer URIs.

The output MUST contain only those pieces which are signed by at least one signer whose URI is listed in the signers list.

The schemes parameter

The schemes parameter is a space-separated list of signature schemes.

The output MUST contain only those pieces which have at least one signature generated using a scheme listed in the schemes list.

Submission

Input is a descriptor identifier and a descriptor piece.

Output is a list of pairs of an endpoint URI and a character string.

The first character of the string indicates the status. It is one of: success (S), partial (P) or failure (F). Subsequent characters SHOULD contain a human-readable messsage to the user.

Success means that all submitted data was accepted.

Partial means that only a portion of the submitted data was accepted.

Failure means that none of the submitted data was accepted.

The resolver contacts endpoints one after another and submits the received descriptor piece via the endpoint’s API. For each endpoint, a pair indicating the result is appended to the list.

Endpoint behaviour

Endpoints MUST process the submitted descriptor piece as follows.

A piece with an empty graph is interpreted as a piece reference, in case the protocol used does not allow for explicit references.

If the graph in the piece is a reference:

  1. Find a descriptor by its identifier; if not found, return failure.
  2. Find a piece by reference; if not found, return failure.
  3. Verify the signatures in the submitted piece over the referenced graph.
  4. Discard all signatures that failed to verify.
  5. Return failure if no signatures remain.
  6. Update the signatures under the piece with the submitted ones.
  7. Return success (all signatures valid) or partial.

Otherwise (if the graph is not a reference):

  1. Find a descriptor by its identifier; if not found, create it.
  2. Find a piece with an identifier equal to that of the submitted piece; if found, return failure.
  3. Verify the signatures in the submitted piece over the referenced graph.
  4. Discard all signatures that failed to verify.
  5. Return failure if no signatures remain.
  6. Insert the piece into the descriptor.
  7. Return success (all signatures valid) or partial.

Note that a new signature may have an expiration time equal to the current time or be in the past, which effectively revokes it.

Pieces that have expired SHOULD be removed and those that did not SHOULD be kept until they do.

It is up to the specific endpoint when and which pieces are kept. Users have no guarantee that endpoints will keep storing their data. It is most desirable that users have their own endpoints, instead of relying on a third party for storing and serving the data.

Security considerations

RDGN registries

Registries SHOULD be somehow protected from flood registation attacks. Such an attack would unnecessarily pollute the RDGN space.

Registries MAY also deregister numbers for which there is no information available within the network that they operate it, which were not registered by means of interchanging data with a registry from another network. The other network MUST be contacted for verification of RDGN status.

Endpoints

If an endpoint stores RDF graph data in the submitted format, it ought to remove all comments in the serialization, in order to avoid malicious users from using the endpoint as a data storage by including comments with arbitrary data.

The statements in the graph SHOULD also be processed and validated. This document recommends putting pieces from unrecognized users into a quarantine to be later reviewed by a human being. Data from trusted users could skip the quarantine in general, but it is recommended to also quarantine it once in a while.

IANA considerations

To be written.