JSON Namespacing

By on September 2, 2009 12:23 pm

(or “Why JSON Hyper Schema means JSON doesn’t need XML’s namespacing colon cancer”)

I recently posted a proposal for an addition to JSON Schema, called JSON Hyper Schema, for defining the properties of a JSON structure that represent links or references within data structures. This is intended to provide the same linking capabilities of JSON Referencing, but in a much more flexible manner such that schemas can be used to describe link information in existing data structures without requiring a fixed convention. I wanted to exposit one of the further benefits of using this type of schema: satisfying the goals of namespacing in JSON.

Many useful mechanisms and conventions have been created in the world of JSON based on the prior similar work in the XML world, such as JSON-RPC and JSON Schemas. Done properly, JSON counterparts have used some of the best ideas from XML and learned from the mistakes. Namespacing is an important concept from XML, and occasionally there have been discussions considering the potential need for namespacing in JSON. XML’s namespacing is indeed valuable in its capability to provide distributed definition of meanings from multiple parties, and this goal is of value in JSON as well. JSON Hyper Schema provides a mechanism for defining hypertext/hyperlink semantics within JSON data structures for URI-based linking that forms a foundation that is capable of achieving these goals.

A naive import of the ideas from XML might simply attempt to apply the semantics and even syntax of XML to JSON. Such an approach fails to account for the fact that JSON is a very different paradigm than XML, fundamentally created for simple object oriented data structures. In order to properly move forward with namespacing in JSON, rather than looking at the syntax in XML, lets look at the goals of namespacing. First, XML and JSON do both share a similar idea in that they provide means for defining the meaning of entities within their structure with names. In XML, tag names are used to give meaning to the contents of elements, and attribute names give meaning to the attribute values. Likewise in JSON, property names give meaning to the property values. The primary goal of namespacing is to build on this to allow distributed definition of meanings from various parties. The key here is the authoritative meaning can be ascribed from different groups beyond that which can be attained from just a short opaque string (tag name or property name).

JSON Schema provides a natural means for this goal. One of the basic functionalities provided by the JSON Schema specification is to define the meaning and nature of properties within an instance JSON data structure. A property definition is a schema with the purpose of describing the value of a property. JSON Schema defines a number of attributes that can be used to define the type of the value, the constraints on the value, and a description of the value, all of which come together to ascribe cumulative meaning to properties of JSON objects.

The base JSON Schema format for describing properties forms the foundation of namespacing’s goal of distributed definition by providing a format for the definition, but alone does not provide the means for linking to the authoritative definition. Distributed efforts almost always are reliant on a global registry for coordination of different parties. When the challenges of this are considered the solution is URIs. This is the basis of the web, XML namespacing, and is the only reasonable means for global referencing to resources. Here JSON Hyper Schema provides the means to associate URIs with definitions such that they can be universally referenced. JSON Hyper Schema’s self-descriptive nature means that agents can determine URIs for every property definition unambiguously. With the JSON Hyper Schema specification to lean on, JSON structures can indicate the link to an authoritative definition of the meaning of its properties.

JSON Hyper Schemas can be referenced from instances by Link headers or media type parameters. A simple example illustrates how JSON properties have universally locatable definitions:

Content-Type: application/json; schema=http://www.book-warehouse.com/book-schema
[
  {"title": "Oliver Twist", "price": 16.99},
  {"title": "Robinson Crusoe", "price": 15.99}
]

This message gives the authoritative URI for the schema. With a look at the schema, we can see how each property has a corresponding definition with an authoritative URI as well:

Content-Type: application/schema+json; schema=http://json-schema.org/hyper-schema
{
  "properties": {
     "title": {"type": "string", "description": "The title of the book"},
     "price": {"type": "number", "description": "The price of the book in US"},
  }
}

Since JSON Hyper Schema specifies how fragment identifiers can be used to reference individual entities in the schema using dot delimited property names, we can construct a full URI for the definition of the title and price properties. The title’s definition URI is http://www.book-warehouse.com/book-schema#properties.title and the price’s definition URI is http://www.book-warehouse.com/book-schema#properties.price. Just as with XML’s namespaces, each JSON namespace provides a fully qualified URI for each property.

Collisions

An oft cited motivation for namespacing is collision avoidance. When simple identifiers are used for element and property names without coordination, it is easy to see how different parties can use the same name with conflicting meanings. However, while it is easy to theorize about potential collisions, when collision avoidance techniques are used, a more frequent and serious issue than collisions can arise: missed “collisions”. Missed collisions may sound good, but when two parties use the exact same identifier (remember there is pretty large domain of words that are available), more often than not they actually mean the same thing. Blindly avoiding collisions misses many opportunities for shared use and meaning applied to entities. For example, suppose we were combining schemas for information on books from another party at http://uk-books.com/schema/book:

{
  "properties": {
     "title": {"type": "string", "description": "The title"},
     "author": {"type": "string", "description": "The author"},
     "price": {"type": "number", "description": "The price in pounds"}
  }
}

Here the title property clearly has the same meaning as the first schema. If a namespacing technique was used that blindly separated the title properties into distinct properties, we would need to specify the title twice in an instance structure. The surface level affect is the obvious redundancy of data, but ignoring DRY principles can have more serious consequences in trying to maintain data synchronization. If the same title value was stored in two properties, someone could easily update one and not realize they needed to update the other one.

A simple look at linguistics illustrates the value of shared vocabulary. Our ability to communicate with others within our region is based on the fact that we have mutually accepted meanings for words. If individuals in a society refused to accept or understand words that were not defined by each individual or even by their family, it would undermine the process of gradually reaching mutual consensus on the usage and meanings of words that is necessary for the development and evolution of a language. Effective communication across a substantial population relies on this convergence of meanings.

The key point here is that property definitions with identical names that are combined from different sources should default to being treated as the same property. Mapping these definitions to separate properties to avoid collisions should be treated as the exception rather than the rule.

Collisions (really avoiding them)

But still, exceptions do happen, and sometimes different parties do use identical names for properties that really have different meaning and need different values. The hyperlinking capabilities that JSON Hyper Schema affords can still be used to solve this situation. When we need to avoid collisions we can map our own property names to specific property definitions from various schemas to create a composite schema with explicitly chosen names. In these two schemas, one price property is expecting a price in US dollars, and the other in UK pounds. We can create a composite schema:

{
  "extends": {"$ref": "http://www.book-warehouse.com/"},
  "properties": {
     "ukPrice": {"$ref": "http://uk-books.com/schema/book#properties.price"},
  }
}

We now have created a new schema that maps the property with the name “ukPrice” to the property definition from the uk-books.com domain of http://uk-books.com/schema/book#properties.price.

Alternately, one could also map property names to property definitions from different schemas to combine property definitions that may have different property names in their containing schemas.

Extensible Schemas

One particular form of schemas that can benefit from authoritative definitions are meta-schemas. Schemas are a quintessential example of data that can benefit from additional properties beyond those defined by the actual JSON Schema specification. JSON Schema defines a number of properties for schemas, but it can be very convenient to include extra data on schemas, such as information about suggested visual layout, persistence information, and associated class implementation information. With JSON Hyper Schemas, one can define a meta-schema that defines the structure and additional properties of your schemas, giving a concrete URL-backed meaning to the properties you add to your schema. For example, we could define a meta-schema:

http://some-site.com/deprec-schema:

Content-Type: application/schema+json; schema=http://json-schema.org/hyper-schema
{
  "extends": {"$ref": "http://json-schema.org/hyper-schema"},
  "properties": {
     "deprecated": {
        "type": "string",
        "description": "When present, this indicates that the associated property \
             is deprecated. The schema value provides information on what \
             property should be used instead".
      }
  }
}

Now we can use the “deprecated” property in our own schemas:

Content-Type: application/schema+json; schema=http://some-site.com/deprec-schema;
{
  "properties": {
     "name":{"type": "string", "deprecated": "Please use 'title' instead"},
     "title":{"type": "string", "description": "The title of this document"},
     ....
  }
}

Namespacing, the JSON way

JSON Hyper Schema is a sufficient building block upon which we can realize the goals of namespacing, namely, meanings from distributed parties can be authoritatively ascribed to elements of a data structure with property name sharing and conflict resolution control. What is elegant about this approach is that basically nothing needs to be modified in the actual instance data structures themselves. URI attribution is done through media type parameters in combination with schemas. The actual data stays unchanged, it is still the original simple, easy to read, compact JSON. JSON namespacing not only fits with JSON technically, but it fits in spirit: data is simple and light.

Comments

  • Really interesting stuff! I like your approach. It’s clean and simple. How are the chances this will be officially approved?

  • Pingback: Ajaxian » Getting hyper about JSON namespacing()

  • The worst part of XML namespaces is the use of URL as identifier, instead of something much more logical and much less confusing like Java package namespaces (tld.organisation.x.y)

    It’s almost as pointless as YAML (“hey! XML is complicated! let’s make it simple with a spec twice as long!”)

  • Charles Ward

    Interesting approach. Nice use of Content-Type parameters. The way you’re handling collisions is novel, but I’m not sure whether I like it.

    I know that <xyz xmlns=”http://example.org/” /> means the same thing in any XML document that I look at. With your scheme, to know what {“xyz”: “”} means I have to parse the schema associated with the document first, right?

    Similarly, since the entire document is in one namespace it looks like I have to write a new schema if I want to combine parts of two existing schemas. It’s occasionally very nice to just drop a chunk of one document into another one, which namespaced XML makes very easy.

    niczar, how are URLs more confusing than Java package names? The only difference is syntax, and URLs have the advantage that you can stick something at the URL.

  • Adam

    I like this but I think encoding the schema in the headers misses the point, at least if you’re trying to build a REST client.

    When we create html files, we don’t add schema=http://www.w3.org/tr/html4 to the content type field.

    Rather, we assume (because its been requested in the Accepts header) that the client making the request will know what text/html is and how to render it.

    What we should be doing is sending “application/mycustomtype+json” as the content type. This way a client can know what to do with they data in my document, something a schema cannot do.

    Its also important to note that using custom namespaces also allows for optimized parsing. Most HTML browsers are not validating clients, they simply ignore the schema and work with the data because none of the important information about that to do with a media type can be communicated in a schema.

  • Paul Prescod

    This is very similar to a convention from the SGML/XML world called “architectural forms.” The war between XML Namespaces and architectural forms was long and protracted. The fundamental argument against them is that now the JSON document is dependent on the schema for its interpretation. Instead of the schema being merely a tool for validation, it actually injects meaning into the document. After decades of combining these two things in SGML, the XML world chose to separate them. They invented schema languages that were “just” validation languages (not XML Schema, but RELAX and others like it) and they invented syntaxes for making XML documents more independent of their schemas.

    Actually, this was the biggest difference between XML and SGML, the independence of instance documents from their schema.

    Once you head down the path of “documents depend on schemas for their semantics” then you have address issues of reliability, security, latency and so forth.

  • @niczar: If you actually prefer Java’s namespaces over real URIs, namespaces that attempt mimic real URIs with the true locator power, than you definitely won’t like JSON hyper schema based JSON.

    @Charles: You only need to parse the schema if it is an unrecognized schema (that maps a recognized schemas meaning to the property names in the document). You are certainly correct that in certain cases, XML-style namespacing is simpler than schema-based property name remapping. However, a key point is that the complexity is pay-per-use. The vast majority of use cases (single namespace, multiple namespace without any collisions, and multiple namespaces where collisions share meaning) are much simpler with schema-based namespacing, since the document doesn’t require any namespace prefix resolution, it can be interpreted as a plain JSON document, and the client can assert understanding by simply matching on recognized schemas. Increased complexity is only necessary in rare edge cases (even though it is really easy to theorize about those edge cases).

    @Adam: I didn’t mean to preclude the use of media types like application/mycustomtype+json. I certainly would still encourage the use of these form of media types. Schema parameters are then simply extra information that provide an authoritative location for looking at the media type definition in a structured format (but a client wouldn’t need to if it understood the media type). Also, you could certainly use an Accept header with schema parameters in the list of acceptable media types, such that a server may be able to deliver the data using a schema that you recognize (to help avoid parsing schemas to find recognizable schemas in the hierarchy).

    @Paul: I agree that having “documents depend on schemas for their semantics” is more complex and less desirable. But like I said to Adam, it is pay-per-use and rarely needed. Furthermore, I think it would be virtually impossible to try get all JSON users to switch to XML-style namespaces. One of the key differences between XML and JSON in terms of usage patterns is due to the fact that JSON often has an unambiguous mapping from language objects and storage structures to serialized data format (one of the big reasons for its popularity). Property names are often defined by the underlying source, rather than a serialization strategy, and so it is critical that any namespacing augmentation can be applied unobtrusively.

  • Christoph

    Hi, I just wonder how I describe a link in JSON-Schema.
    This is my JSON representation:

    {
    “mail”: {
    “id”: “23”,
    “fwd”: “24” ,
    “attachment”: {
    “$ref”: “http://dl.example.org/3443”
    }
    }
    }

    And the corresponding schema:
    {
    “type”: “object”,
    “hrefProperty”: “$ref”,
    “properties”: {
    “id”: {
    “locatorProperty”: “self”
    },
    “fwd”: {
    “locatorProperty”: “next”
    } ,
    “attachment”: {
    “type”: “object”,
    “properties”: { ?????

    }
    }
    }
    }
    Look at the attachment-property. How do I describe it?