JavaScriptDB: Persevere’s New High-Performance Storage Engine

By on April 20, 2009 8:47 pm

The latest beta of Persevere features a new native object storage engine called JavaScriptDB that provides high-end scalability and performance. Persevere now outperforms the common PHP and MySQL combination for accessing data via HTTP by about 40% and outperforms CouchDB by 249%. The new storage engine is designed and optimized specifically for persisting JavaScript and JSON data with dynamic object structures. It is also built for extreme scalability, with support for up to 9,000 petabytes of JSON/JS data in addition to any binary data.

These statistics are even more impressive when one considers all the additional functionality that Persevere provides while outperforming these other storage systems. MySQL utilizes traditional fixed structure schemas requiring homogenous records in a table, while JavaScriptDB (as well as CouchDB) support storage of heterogeneous objects of any structure in tables. Persevere/JavaScriptDB goes further with the flexibility to evolve schemas and handle partial schemas. Persevere also provides integrated server side JavaScript (SSJS) with persistence, Comet-driven data change notifications, JSONQuery, standards based HTTP interface with content negotiation, JSON-RPC interface to SSJS, cross-domain handling, CSRF protection, and more. All of these things are additional features that one would have to add to the stack for other storage systems, making them even slower. Persevere includes this functionality out of the box, while still maintaining extremely fast performance.

Test Scenario

These tests were performed on a Mac/OS-X with a 2GHz dual-core Intel processor and 1 GB 667 MHz DDR2 memory. The PHP/Apache/MySQL setup used MAMP 1.7.2 which includes PHP 5.2.6, MySQL 5.0.41, and Apache 2.0.59. The CouchDB tests were performed with CouchDBX version 0.8 (which uses CouchDB 0.8.1). Persevere’s nightly builds from late March were used for JavaScriptDB tests.

Three different operations were performed in the tests:

  • Insert/POST operation to create a new object
  • Update/PUT operation to update an object
  • Query/GET to search for objects by an (indexed) field/property

On the PHP/MySQL tests, all operations were handled by a very simple PHP script that first did a quick security check query against a small table (actually empty for the tests) to emulate the security capabilities of Persevere and CouchDB (although CouchDB’s security capabilities are limited, and probably often require additional logic), and then the main query was executed against MySQL, whether it be an INSERT, UPDATE, or SELECT. All the created objects/records had four properties/fields for all three systems. In MySQL, two properties were indexed, one being the primary key, the other being the property that was queried on in the query requests. In CouchDB, a simple view was created that indexed on a single property. This view was used for the requests that queried by index. Both Persevere and CouchDB tests used their standard HTTP interface for creating, updating, and querying.

Tests were carried out by a HTTP client running on 10 threads concurrently issuing a sequence of 200 of each type of request. The “full test” performed create, update, and query requests. The “write test” performed create and update requests, and the “read test” only performed query requests. The files used to perform the benchmarks are available here.

Test Conclusions

Two sets of tests were run—one that used fast commits that do not wait for committed data to be actually physically written to the disk (allowing for normal OS write-back caching), and high-integrity commits which cause the committed data to be forced to the disk. JavaScriptDB has a setting for choosing which style of commits to use. In MySQL, the MyISAM storage engine was used for fast commits, and the InnoDB storage engine was used for high-integrity commits. CouchDB always uses high-integrity commits.

While PHP is not the fastest language, the script that was used for these tests was very trivial, and it’s unlikely that PHP code execution significantly detracted from the overall performance of the PHP/MySQL combination. With this simple, streamlined PHP script, a very fast classic setup was used. Alternate languages would not be likely to improve the performance by very much. Yet, Persevere’s JavaScriptDB still beat this setup by a significant margin. With more complex request handlers that might provide more of the functionality that Persevere already provides, the margin would be likely to increase even more. Quite simply, the classic application server + MySQL database setup is hard-pressed to compete with Persevere in terms of performance for most normal database interactions.

So how does Persevere achieve this level of performance with the JavaScriptDB storage? The dynamic object-oriented nature of the data that is stored in JavaScriptDB is much different than that of a traditional relational database, so a number of innovative approaches were employed.

Direct Data-Bound Object Representation

One of the central concepts of Persevere is that all persisted data is mapped to JavaScript objects. This enables server side JavaScript functions and handlers to easily be able to interact with persisted data, and provides a convenient in-memory representation of data that allows for intuitive normal object-oriented data interaction. However, in Persevere this more than just a convenient API—it also facilitates efficient memory utilization by providing a single in-memory representation that can be reused at multiple levels.

In a traditional application stack, a record must have separate in-memory representations for each different level in the stack. A database may have an in-memory representation before serializing result sets back to the application. The application may have result set level representation, which then might be mapped to an object representation. Every one of these levels consumes more memory. These extra layers increase latency and overhead as well. In addition, most database driven applications rely on TCP/IP communication with the database, which consumes a large amount of resources as well. With the JavaScriptDB, the single in-memory object is efficiently reused at the database level for all result sets and data caching. This not only means less memory-consumption, but it also translates to more efficient CPU cache utilization for Persevere, and direct low-latency access to data.

Shared Cache of Objects with Copy-on-Write

Not only are in-memory objects shared between the application level and the database level, but Persevere also utilizes a shared cache of objects between threads to ensure that any given record/object only exists in memory at most one time. Traditional application frameworks process separate HTTP requests concurrently and each request will have its own result set and a copy of data. These can lead to significant duplication of data in memory. With Persevere, objects are always reused if they are still available in memory.

While this technique is relatively simple for read-only data, Persevere still maintains virtual memory isolation between threads to protect against concurrent access between threads and ensuing race conditions. Persevere does this by performing copy-on-write style values in objects. When a property is modified, internally its value is modified to being a “transactional” value that actually has multiple states depending on which thread is accessing the objects. Therefore an object can be modified by one thread, but another thread can access the same object without seeing the uncommitted change. Property changes are made visible when transactions are committed. This technique allows Persevere to maintain transactional isolation between concurrent request handlers, while minimizing the record/objects that must be held in memory. Persevere’s architecture combined with JavaScriptDB’s integration minimizes memory consumption, allowing internal caches to be maximized for optimal performance.

Persevere utilizes the sophisticated least recently used (LRU) caching capabilities of the Java Virtual Machines’s (JVM) soft referencing mechanism. In-memory objects, as well as JavaScriptDB’s indices, are cached via soft reference tables. This allows the JVM to utilize an integrated view of reachability and object access timestamps to determine which objects to collect and discard. This means that objects that are reachable by currently executing code will always stay in the cache (since they must stay in memory due to reachability) as long as they are reachable. Unreachable objects are then discarded according to LRU strategies. Since the JVM’s garbage collection handles object collection at a global level, it is also able to optimally select objects for collection without being constrained by module level view. This means that if the indices are not being used frequently, more memory can be allocated to the object cache and vice versa. Caches are maintained according to usage and reachability with the JVM’s global perspective for optimal discarding strategy.

Append-based Database Storage

JavaScriptDB uses an append-based database format to store data. Many traditional database will synchronously commit data to a transaction log file before committing data to the main storage table, which requires multiple writes. On the other hand, JavaScriptDB appends transactional data directly to the main storage file such that writes can be committed with a single IO operation. This also enables JavaScriptDB to efficiently maintain a version history of the database and its records. The storage file is essentially a running log of transactions, and these transactions are exposed as the Transaction table. By storing data as a sequential set of transaction, JavaScriptDB not only can persist data quickly, it also provides efficient access to the transactions that have taken place and a version history of the database and objects within it.

Adaptive On-Demand Concurrent Indexing

JavaScriptDB features a dynamic approach to indexing that minimizes the configuration and management required to create and maintain tables, and maximizes performance. By default, JavaScriptDB indexes all properties of persisted objects, so typical queries can almost always be run in fast O(log n) time. However, the indexer does not block write operations to complete index updates when objects are added, deleted, or modified. Rather, the indexer execution takes place concurrently in a background asynchronous task executing threads. As objects are indexed, the index update operations are delegated to the appropriate index nodes, which are also executed as asynchronous tasks. When an index is needed for a query, any outstanding updates along the node tree path are completed so the query can execute.

It is worth noting that this on-demand indexing does not mean that the entire index must be updated to execute a query. Often (and usually in the case of large databases) an object may be updated that affects an index node that isn’t used in a subsequent query. In this case, the query can still execute without waiting for the index node to be updated. JavaScriptDB properly orchestrates concurrent indexing such that nodes are updated through lower-priority background threads when possible, and immediately updated on-demand as necessary. This allows write operations to take place very quickly, while still allowing indexes to be ready for fast query operations as well. This also allows Persevere to utilize resources and CPU processing more evenly and smoothly. Background processes can take CPU time as needed when client requests are not demanding immediate data retrieval.

Furthermore, JavaScriptDB uses adaptive techniques with indexing. If a particular index has been unused for some time while many objects have been added to a table with the corresponding property, JavaScriptDB will stop proactively updating the index to conserve resources. When an index is no longer proactively updated, the index will only be updated on-demand, when a query is performed that requires that index. Once the index is updated, it will resume proactive updates (at least until disuse causes it to go back to a non-proactive update state). This approach allows JavaScriptDB to automatically do appropriate and efficient indexing with minimal manual configuration. JavaScriptDB does also support manual configuration of indexes, for situations where you may want explicit control of indexing.

Batched writes in integrity mode

One of the most expensive operations that a database can perform is a forced synchronous disk write operation. These operations are necessary for high-integrity commit mode where the commit does not return until the database is certain that the data has actually been written to the disk, fulfilling the durability component of ACID compliance. These operations can take around 10ms. In order to improve the performance of high-integrity commits, Persevere will detect when multiple writes are taking place concurrently and batch multiple writes together in a single synchronous disk write operation. When a number of concurrent write requests are being sent to Persevere, this can significantly reduce the number of synchronous writes that must take place and greatly improve performance.

Pluggable Storage

Persevere uses a pluggable storage system. JavaScriptDB is one of several data source plugins (the default data source) that can be used with Persevere. Persevere supports heterogeneous storage configurations. This means you can leverage the performance and flexibility of JavaScriptDB in Persevere without abandoning existing relational databases, as well as other data sources. Even custom data sources can be created for unique storage systems.

The ServerJS working group is also considering a standard API for database interaction that might possibly allow JavaScriptDB to be used as a standalone database engine for use by other Rhino-based frameworks like Helma (of course Persevere + JavaScriptDB can already be used with existing JavaScript modules, and it can be used as a database for Java applications through it’s Java API).

Future Improvements

This is the first release of JavaScriptDB, so there is still significant opportunities for continuing to improve and refine this storage engine. Currently, JavaScriptDB does not utilize indices for nested object queries (the equivalent of inner joins in relational DBs). Consequently queries of the form [?prop1=’something’] will execute in O(log n) time, but queries of the form [?prop1.prop2='something'] will only execute in O(n). Future versions will provide fast O(log n) for a much broader range of queries. A later release will also provide true ACID compliance (the current version does not fulfill the atomicity constraint). Finally, replication/clustering services will be added in the future as well, for distributing Persevere workload across multiple servers.

Real Value

As Alex Payne pointed out, the economy may be ending the era of disregard for system performance and efficiency with the excuse of buying more servers. More servers costs more money, and architectures like Persevere that can efficiently handle large numbers of users and traffic with minimal hardware resources equates to real money saved.

Persevere combines numerous advanced capabilities for web-accessible data including standards-based HTTP interface, JSONQuery, JSON-RPC, server side JavaScript, Comet-based data notifications, robust security, and more. Now these capabilities are available with speed and scalability that outperforms the most common web application systems, allowing you to build high-performance client/server Ajax web applications with unprecedented ease, efficiency, and value.

Update: Jan Lehnardt pointed out that CouchDB is now at version 0.9.0 and OS-X is not the optimal platform for CouchDB, so the latest version CouchDB can presumably improve upon the CouchDB performance shown in these tests. Hopefully we can progress towards better benchmarking tools for this new breed of databases.


  • Kris, I recently played with Helma NG on Google appengine and it worked fine. Now would it make sense (assuming it is possible at all) to use persrvere with Helma on appengine and use bigtable as pluggable storage ? What features of persevere would we loose in such a scenario (one I can think of is unrestricted Comet, appengine limits request lifetime to 30 sec) ?

  • Hi,

    just a quick comment from the CouchDB team: Impressive numbers! Cool stuff and congrats. CouchDB is still in alpha and is not yet optimized for speed, so we might be able to catch up a little :)

    CouchDBX 0.8 is not even an official release and way out of date, it’d be great if you could run the tests with an 0.9 version again and preferably not on Mac OS X because it can only safe data reliably very slowly (and CouchDB errs on the side of safe storage).

    Lastly, CouchDB is not meant to be extremely fast for single transactions, only “fast enough”. Where CouchDB shines is the number of concurrent requests you can hit CouchDB with and it’ll handle gracefully.

    We’re very interested in helping out with meaningful benchmarks in the future, if you like ot collaborate, please do get in touch.


  • Neat stuff Kris. Just a clarification — JavaScriptDB is something created from scratch as part of the Persevere codebase, and not based on this SourceForge project: (I think that project is just JS files anyway).

    I kept looking for a link to read more about JavaScriptDB specifically and came across that SourceForge link, but it looks like this link might be the most appropriate:

  • @James: That’s correct, it’s created from scratch as part of Persevere. And my spanish isn’t very good ;).

  • Pingback: Ajaxian » Persevere’s JavaScriptDB: Impressive JSON Performance()

  • Why Dojo?
    Do you need some help for mootools? I’m ready to help if you need ;)

  • @Roberto: There has certainly been some discussion of running Persevere on GAE in the Persevere and ServerJS mailing lists ( and AFAICT it should work, and it is a matter of creating a data source adapter for GAE’s big table to work with Persevere. It’s on the roadmap.

    @Nunzio: Did I mention Dojo in this article? Persevere should work with any library. If you would be willing to help with making mootools work with Persevere, that would be awesome, I would certainly be glad to assist you.

    @Jan: My apologies for being out of date with CouchDB. Is 0.9 a new release? I thought I was using the latest version when I ran the tests several weeks ago. I used CouchDBX because running from the source seemed to take a lot of work.

    I would be glad to collaborate on more meaningful benchmarks. Obviously a lot of this was rather ad-hoc, since we are kind of pioneering new space with JSON DBs, and so I would be glad to work together on better benchmarks.

    It should also be noted that the tester was firing off requests in 10 concurrent threads, and all the tested servers were keeping the CPU close to pegged. This was definitely a test of concurrent request handling, and not isolated single transaction speed.

  • @Kris

    Yeah, let’s collaborate. We publised 0.9 a two months back and there’s a new CouchDBX release candidate here: But note that Mac OS X is not the best host system to run this because of the fsync() / F_FULLFSYNC issue.

    Concurrency: I’d like to see 100s (and 1000s) of threads doing fewer requests from multiple source hosts to get closer to reality. Maybe we should look into Tsung for generating and orchestrating even more organic load for all the REST/KV stores out there.

  • Hey Kris,

    Awesome work! This sounds absolutely fantastic from a DB user perspective. A couple of questions from the DB admin perspective:

    1. what is database packing like? (does it happen live, and does it put a lot of load on the system) This was always something that bothered me about the ZODB. It uses a similar append-only file storage, which works great in general. But packing was quite expensive in CPU/disk access. On the plus side, packing would happen while the system is live.

    2. with indexes on all of the object properties, do you find that the database grows very quickly?

    One final note: a full-text search index would be a great addition.

    Those questions aside, this sounds like a stellar release.


  • @Kevin: Packing is not implemented yet, but I was planning on tracking free/reusable segments, and overwriting them with new data when packing is enabled. Thus, it would run live, and I believe it would be reasonably fast.

    As far as indexing, it seems like the main storage file has generally stayed larger than or similar in size to the index file. As noted in the post, properties that are not queried also go inactive, and so they don’t increase the index file either after a while.

  • Pingback: Preserve JavascriptDB: Yet Another Non-Traditional Data Store - Standard Deviations()

  • Pingback: Brent Sordyl’s blog » JavaScriptDB: Persevere’s New High-Performance Storage Engine()

  • @Kris That packing approach is a lot harder than what the ZODB does (write a new file that skips over old/unreferenced objects). It’ll be awesome if you can make it work well.

  • pcdinh

    Sound very cool indeed. Will it be extended to support more language other than Javascript/Rhino?



  • Client side db is always exciting!

  • Erik Bengtson


    I tried your test case with DataNucleus REST on GAE, and I was able to top 50 transactions/sec but the quota limit was triggered and I couldn’t go further. From the error messages of GAE, I believe GAE could handle thousands tx/sec if you contract enough quota.

    Here the URL if someone wants to try

  • @pcdinh: JavaScriptDB can be used by any JS environment via Persevere’s HTTP interface (more direct calls from a non-JVM based platform would obviously require a bridge). This post explains how to use Persevere from any other SSJS environment (this shows Jaxer):

    @布里斯班: This is a server side DB, not client side (although it is designed for client side ease of use).

    @Erik Bengtson: With adequate funding, I would certainly think that GAE can support thousands of tx/sec with Google’s plethora of servers.

  • Anony Mouse

    Could this benchmark be any more USELESS? Simply testing the performance of reads/writes shows nothing about real-world performance. How about a realistic workload with a realistic application?

    Also, only 260reads/sec using PHP+MySQL on your machine is a lot below optimal – you should look into optimizing the configuration (eg. lighttpd+FastCGI+bytecode caching+persistent connections/UNIX sockets – shall I say REAL WORLD again?)

  • @Anony: Testing real applications is certainly important, but basic read/writes are some of the important building blocks from which applications are composed, and isolating the different aspects of performance is important for understanding the performance characteristics of a server. I certainly encourage real world application tests on these technologies, but for initial performance demonstration, complex tests can be suspicious since it may be difficult to demonstrate and understand the performance levels and bottlenecks that contributed to the overall results. Simple tests provides more generically applicable information than a specific application test. Also, I wanted to include CouchDB in the tests, and we simply can’t run very sophisticated queries or operations with CouchDB, it is very limited in what it can handle.

    The tests were also intended to be simple in terms of PHP code to minimize code execution and differences due to optimization of configuration of PHP/Apache, and focus on DB interaction.

  • Blacktiger

    I’d love to see a comparison between Persevere running JavaScriptDB and Persevere running against MySQL. What kind of performance hit do you get from that?

  • Ahmed Refaat

    Whatever happened to JavascriptDB? I thought it was a great project. Kris, is the source code available?