(or “Why JSON Hyper Schema means JSON doesn’t need XML’s namespacing colon cancer”)

I recently posted a proposal for an addition to JSON Schema, called JSON Hyper Schema, for defining the properties of a JSON structure that represent links or references within data structures. This is intended to provide the same linking capabilities of JSON Referencing, but in a much more flexible manner such that schemas can be used to describe link information in existing data structures without requiring a fixed convention. I wanted to exposit one of the further benefits of using this type of schema: satisfying the goals of namespacing in JSON.

Many useful mechanisms and conventions have been created in the world of JSON based on the prior similar work in the XML world, such as JSON-RPC and JSON Schemas. Done properly, JSON counterparts have used some of the best ideas from XML and learned from the mistakes. Namespacing is an important concept from XML, and occasionally there have been discussions considering the potential need for namespacing in JSON. XML’s namespacing is indeed valuable in its capability to provide distributed definition of meanings from multiple parties, and this goal is of value in JSON as well. JSON Hyper Schema provides a mechanism for defining hypertext/hyperlink semantics within JSON data structures for URI-based linking that forms a foundation that is capable of achieving these goals.

A naive import of the ideas from XML might simply attempt to apply the semantics and even syntax of XML to JSON. Such an approach fails to account for the fact that JSON is a very different paradigm than XML, fundamentally created for simple object oriented data structures. In order to properly move forward with namespacing in JSON, rather than looking at the syntax in XML, lets look at the goals of namespacing. First, XML and JSON do both share a similar idea in that they provide means for defining the meaning of entities within their structure with names. In XML, tag names are used to give meaning to the contents of elements, and attribute names give meaning to the attribute values. Likewise in JSON, property names give meaning to the property values. The primary goal of namespacing is to build on this to allow distributed definition of meanings from various parties. The key here is the authoritative meaning can be ascribed from different groups beyond that which can be attained from just a short opaque string (tag name or property name).

JSON Schema provides a natural means for this goal. One of the basic functionalities provided by the JSON Schema specification is to define the meaning and nature of properties within an instance JSON data structure. A property definition is a schema with the purpose of describing the value of a property. JSON Schema defines a number of attributes that can be used to define the type of the value, the constraints on the value, and a description of the value, all of which come together to ascribe cumulative meaning to properties of JSON objects.

The base JSON Schema format for describing properties forms the foundation of namespacing’s goal of distributed definition by providing a format for the definition, but alone does not provide the means for linking to the authoritative definition. Distributed efforts almost always are reliant on a global registry for coordination of different parties. When the challenges of this are considered the solution is URIs. This is the basis of the web, XML namespacing, and is the only reasonable means for global referencing to resources. Here JSON Hyper Schema provides the means to associate URIs with definitions such that they can be universally referenced. JSON Hyper Schema’s self-descriptive nature means that agents can determine URIs for every property definition unambiguously. With the JSON Hyper Schema specification to lean on, JSON structures can indicate the link to an authoritative definition of the meaning of its properties.

JSON Hyper Schemas can be referenced from instances by Link headers or media type parameters. A simple example illustrates how JSON properties have universally locatable definitions:

Content-Type: application/json; schema=http://www.book-warehouse.com/book-schema
[
  {"title": "Oliver Twist", "price": 16.99},
  {"title": "Robinson Crusoe", "price": 15.99}
]

This message gives the authoritative URI for the schema. With a look at the schema, we can see how each property has a corresponding definition with an authoritative URI as well:

Content-Type: application/schema+json; schema=http://json-schema.org/hyper-schema
{
  "properties": {
     "title": {"type": "string", "description": "The title of the book"},
     "price": {"type": "number", "description": "The price of the book in US"},
  }
}

Since JSON Hyper Schema specifies how fragment identifiers can be used to reference individual entities in the schema using dot delimited property names, we can construct a full URI for the definition of the title and price properties. The title’s definition URI is http://www.book-warehouse.com/book-schema#properties.title and the price’s definition URI is http://www.book-warehouse.com/book-schema#properties.price. Just as with XML’s namespaces, each JSON namespace provides a fully qualified URI for each property.

Collisions

An oft cited motivation for namespacing is collision avoidance. When simple identifiers are used for element and property names without coordination, it is easy to see how different parties can use the same name with conflicting meanings. However, while it is easy to theorize about potential collisions, when collision avoidance techniques are used, a more frequent and serious issue than collisions can arise: missed “collisions”. Missed collisions may sound good, but when two parties use the exact same identifier (remember there is pretty large domain of words that are available), more often than not they actually mean the same thing. Blindly avoiding collisions misses many opportunities for shared use and meaning applied to entities. For example, suppose we were combining schemas for information on books from another party at http://uk-books.com/schema/book:

{
  "properties": {
     "title": {"type": "string", "description": "The title"},
     "author": {"type": "string", "description": "The author"},
     "price": {"type": "number", "description": "The price in pounds"}
  }
}

Here the title property clearly has the same meaning as the first schema. If a namespacing technique was used that blindly separated the title properties into distinct properties, we would need to specify the title twice in an instance structure. The surface level affect is the obvious redundancy of data, but ignoring DRY principles can have more serious consequences in trying to maintain data synchronization. If the same title value was stored in two properties, someone could easily update one and not realize they needed to update the other one.

A simple look at linguistics illustrates the value of shared vocabulary. Our ability to communicate with others within our region is based on the fact that we have mutually accepted meanings for words. If individuals in a society refused to accept or understand words that were not defined by each individual or even by their family, it would undermine the process of gradually reaching mutual consensus on the usage and meanings of words that is necessary for the development and evolution of a language. Effective communication across a substantial population relies on this convergence of meanings.

The key point here is that property definitions with identical names that are combined from different sources should default to being treated as the same property. Mapping these definitions to separate properties to avoid collisions should be treated as the exception rather than the rule.

Collisions (really avoiding them)

But still, exceptions do happen, and sometimes different parties do use identical names for properties that really have different meaning and need different values. The hyperlinking capabilities that JSON Hyper Schema affords can still be used to solve this situation. When we need to avoid collisions we can map our own property names to specific property definitions from various schemas to create a composite schema with explicitly chosen names. In these two schemas, one price property is expecting a price in US dollars, and the other in UK pounds. We can create a composite schema:

{
  "extends": {"$ref": "http://www.book-warehouse.com/"},
  "properties": {
     "ukPrice": {"$ref": "http://uk-books.com/schema/book#properties.price"},
  }
}

We now have created a new schema that maps the property with the name “ukPrice” to the property definition from the uk-books.com domain of http://uk-books.com/schema/book#properties.price.

Alternately, one could also map property names to property definitions from different schemas to combine property definitions that may have different property names in their containing schemas.

Extensible Schemas

One particular form of schemas that can benefit from authoritative definitions are meta-schemas. Schemas are a quintessential example of data that can benefit from additional properties beyond those defined by the actual JSON Schema specification. JSON Schema defines a number of properties for schemas, but it can be very convenient to include extra data on schemas, such as information about suggested visual layout, persistence information, and associated class implementation information. With JSON Hyper Schemas, one can define a meta-schema that defines the structure and additional properties of your schemas, giving a concrete URL-backed meaning to the properties you add to your schema. For example, we could define a meta-schema:

http://some-site.com/deprec-schema:

Content-Type: application/schema+json; schema=http://json-schema.org/hyper-schema
{
  "extends": {"$ref": "http://json-schema.org/hyper-schema"},
  "properties": {
     "deprecated": {
        "type": "string",
        "description": "When present, this indicates that the associated property \
             is deprecated. The schema value provides information on what \
             property should be used instead".
      }
  }
}

Now we can use the “deprecated” property in our own schemas:

Content-Type: application/schema+json; schema=http://some-site.com/deprec-schema;
{
  "properties": {
     "name":{"type": "string", "deprecated": "Please use 'title' instead"},
     "title":{"type": "string", "description": "The title of this document"},
     ....
  }
}

Namespacing, the JSON way

JSON Hyper Schema is a sufficient building block upon which we can realize the goals of namespacing, namely, meanings from distributed parties can be authoritatively ascribed to elements of a data structure with property name sharing and conflict resolution control. What is elegant about this approach is that basically nothing needs to be modified in the actual instance data structures themselves. URI attribution is done through media type parameters in combination with schemas. The actual data stays unchanged, it is still the original simple, easy to read, compact JSON. JSON namespacing not only fits with JSON technically, but it fits in spirit: data is simple and light.