This entry is part 11 of 12 in the series Server-Side JavaScript, Pintura, and Persevere 2.0

Node is a server-side JavaScript platform that is known for being well suited for Comet-style applications that utilize long-lived connections to allow applications to send messages from the server to the browser asynchronously. In fact, the beginning of the front page of nodejs.org starts out with an example of a web application that delays for a couple seconds before sending a response without any type of blocking; the code is asynchronous and efficient.

However, building a real-life real-time application involves more than just a platform that gives you asynchronous communication, there are a number of other important techniques to understand. We will look at these techniques and introduce project Tunguska that provides some helpful tools to assist in building applications. While a number of Comet projects out there attempt to provide a black box solution to Comet, Tunguska recognizes that most real-time applications involve deep integration into the application and its security, messaging, and data structures. Consequently Tunguska is a set of tools for building real-time applications rather than a closed black box that can’t easily be integrated with. Let’s look at some of these tools.

Connections/Message Queues

The most fundamental concept of building real-time applications is providing the ability to asynchronously deliver messages that occur at any time, to the browser. With new technologies like WebSockets, this is relatively straightforward. You can create a connection to the server and then the server is free to send messages to the browser at any time. However, WebSocket capable browsers are still nowhere near widely distributed enough to rely on (and the WebSocket protocol is still in development, a recent revision rendered the first version incompatible, and more changes are likely to occur due to the immature design).

The most widely used technique for Comet is known as long-polling. With long-polling an HTTP request is made to the server which waits until it has a message to send. The message is sent to the browser as the response, and upon receiving the response the client makes another request to the server. However, the complication of this strategy is that the server does not have a continuous, uninterrupted stream to send messages to the client. There is a time gap between each message/response being sent from the server and the next long-polling request from the browser reaching the server.

However, we obviously don’t want to miss the messages that were generated during these time gaps. Therefore we need to provide a virtual connection by creating a queue where messages can be stored during these time gaps between the response and request. These message queues allow us to effectively emulate a continuous connection, whereby messages can be delivered at any time to the virtual connection. If an active long-polling response is waiting the server can immediately deliver the message, otherwise the message can go into a queue to wait for the next request from the browser to send a message.

Tunguska provides this type of virtual connection or message queueing with its queue module, and message queues can be accessed with the getQueue(clientId) function. Queues are identified by a client id. This function will return a queue object for a given client id passed as the argument. The queue object extends JavaScript’s Array (providing array functions for accessing queued events), provides a send() for delivering events, a message event, and a close event (accessible via addListener() or observe()) as the key functionality.

var getQueue = require("tunguska/queue").getQueue;
// get the virtual connection for this request
var queue = getQueue("32a3fa5");
queue.addListener("message", function(message){
   // this will be fired by the send call below
});
// put a message in the queue identified by this request.
queue.send("Hello");

The queue module does more than just provide a queue object, it also handles timeouts of non-active connections so as to avoid unbounded memory usage, or memory leakage due to accumulation of unused connection objects. It also handles distributed message queues (across multiple processes/machines) as we will see later.

Tunguska also provides a “jsgi/comet” module for the higher HTTP level interaction. A message queue, which acts as a virtual connection to a client, can be obtained from a request object (JSGI or Node) passed into its getClientConnection(request) function. This function will look for the “Client-Id” header to use as the client id for connection identification. If a connection exists with the given client id, it will be returned, otherwise a new connection object/message queue will be created and returned.

The connection object is based on the WebSocket API. Sending a message to the client is as simple as calling the send(message) function. The connection object is also an array. New messages are pushed onto the array and any listeners are notified.

var comet = require("tunguska/jsgi/comet");
// get the virtual connection for this request
var connection = comet.getClientConnection(request);
// put a message in the queue identified by this request.
connection.send({from:"Kris", body:"hi"});

Long-polling handler

A long-polling request handler can now listen for messages by initially checking for any messages in the queue for instant delivery and if no messages are ready to deliver to the client, it can listen for message events to trigger delivery. A JSGI middleware appliance is provided with the jsgi/comet module to do just that. We can easily insert this appliance into a middleware stack:

cometApp = comet.Broadcaster(nextApp);

The optional nextApp parameter can be used to define another middleware appliance that can handle any needed subscriptions and/or other wiring to event triggers to relay messages to the virtual connection object. For example:

cometApp = comet.Broadcaster(function(request){
  var connection = comet.getClientConnection(request);
  // user function for registering a user to get user events
  listenForMessagesForUser(
    request.remoteUser, // if we have an authenticated user
    function(message){
      // this can be called when a message needs to be sent to the user
      connection.send(JSON.stringify(message));
    });
});

Message Routing

One of the key aspects of real-time applications is message routing. When an action or event occurs, the message needs to be routed to the appropriate party. In a chat application, when someone sends a chat message, we need to route and send the message to the recipient. A frequently used mechanism for this is a publish/subscribe hub. A hub consists of “channels” or “topics”, and messages can be published to channels on the hub, and subscribers can listen for messages on these channels. This paradigm decouples sending messages and listening for messages. Message publishers do not need to be concerned with who is subscribed and subscribers do not need to be concerned with the details of who and how messages are published.

Pub/Sub Hub

There are certainly alternate approaches to topic-based pub/sub hubs for message routing. Rather than subscribing to a particular named topic, subscribers could provide message filter functions that filter through all messages for appropriate messages of interests. However, this approach does not scale well. As more diverse topics and messages are being sent through a system, a filter often begins to process more non-matching functions and performance suffers. On the other hand, pub/sub hubs scale well. A normal hash-table based hub maintains constant time message routing regardless of the number of topics. This highly scalable approach fits well with the Node philosophy of making it easy to do the right thing (in terms of scalability) and hard to do the wrong thing. By using the named topics, it is very easy to create highly scalable message routing systems.

Creating a very simple pub/sub hub in JavaScript is actually extremely easy. JavaScript’s hash-based objects make a great core of a hub. You can simply create properties on a “hub” object with names that correspond to topics and values that are arrays of listeners. However, more interesting features than simple topic based message routing are often desirable. Tunguska provides a “hub” module with a rich set of publish/subscribe capabilities.

The Tunguska hub supports globbing or wildcard-based subscriptions for listening for a set of topics (rather than only being able to register for a single topic at once).

var hub = require("tunguska/hub");
hub.subscribe("foo/*", function(event){
  // will be fired by the publish below
});
hub.publish("foo/bar", "hi");

It allows for “typed” messages to allow for filtering/routing by type as well as topic (essentially allowing a hub to have a two-dimensional namespace). This hub uses “typed” messages for broadcasting subscription events, sending out a “monitored” event when a topic starts to have subscribers or ceases to have any subscribers.

hub.subscribe("foo/*", "monitored", function(event){
  // will be fired by the subscribe action below
});
hub.subscribe("foo/bar", function(){...});

Tunguska’s hub also supports echo suppression, allowing subscribers and publishers to indicate a client identifier so that published messages do not echo back to the client. These components are important for various applications and are essential for Tunguska’s distributed messaging system, where hubs can be linked together in a cluster.

var clientHub = hub.fromClient("42a92f3");
clientHub.subscribe("foo/bar", function(event){
  // will be *not* fired by the publish below
});
clientHub.publish("foo/bar", "for everyone else");

Clustering

Building Comet applications on a single Node process (single threaded) can be deceptively easy. But, scaling Comet applications to multiple concurrent processes is where the bulk of complexity usually arises. Inter-process communication, non-deterministic request delegation, message framing, and true parallel concurrency can combine to create a formidable challenge compared to the rather simple single-process application development. Fortunately, Tunguska provides clustering support for linking distributed pub/sub hubs together, allowing applications to maintain a simple hub-based message routing system without having to worry about the inter-process interaction complexity.

The core component for building Tunguska clusters is the “connector” module. The “connector” module provides a means of using connections to other processes to coherently link hubs together. These connectors will listen for subscriptions and send subscription requests to other processes, allowing for collection of all messages for a given topic from all processes. A connector can be created by passing it a connection object (that follows the WebSocket API), and the connector will handle the messaging concerns for propagating subscriptions and message routing.

Tunguska
WebSocket connection objects can easily be utilized with actual WebSocket TCP connections to other machines. For inter-process communication on a single machine, multi-node provides easy access to a WebSocket-based connection to the sibling HTTP serving processes. Connecting a set of processes with Tunguska with multi-node is very simple:

var multiNode = require("multi-node/multi-node"),
    Connector = require("tunguska/connector").Connector;
// start the multiple processes
var nodes = multiNode.listen({port: 80, nodes: 4}, serverObject);

// add a listener for each connection to the other sibling process
nodes.addListener("node", function(stream){
  // create a new connector using the framed WS stream
  Connector("local-workers", multiNode.frameStream(stream));
});

Jack/JSGI Support on Other Platforms

In addition to Node, Tunguska can run on any other JSGI-compliant server including Jack 0.3, and should be easily adaptable to RingoJS as well. The main complication with different platforms is creating connectors with alternate forms of cross-process or cross-thread communication. Jack 0.3 utilizes threads for concurrency that implement the WebWorker API and includes special events for connecting to other workers for fully connected workers (achieving fully connected workers, where every worker can talk to every other worker is problematic with the standard worker API since it relies on the less ubiquitous SharedWorker API and requires a high level of name coordination). To use the Tunguska connectors with Jack, you utilize the “jack-connector” module, which provides “worker” events that return connection streams that can be used with the connectors:

require("tunguska/jack-connector").addListener("worker", function(connection){
  Connector("local-workers", connection);
});

Tunguska provides a solid set of tools for building scalable, non-blocking, real-time applications with the flexibility and modularity needed to integrate with essential application logic including custom serialization, authorization/access-control, and middleware routing to truly leverage the asynchronous I/O capabilities of Node. Next up, we will look at how Tunguska, which is a sub-project of Persevere, fits into the greater Persevere ecosystem for data integration with real-time data views, and RESTful content negotiation.