ECMAscript • Gated Logic • nevali.net

As you may know, I do a lot of work with PHP. One of my projects is an MVC-style application framework. This framework was developed with the express intention of being able to take advantage of running in an application server environment, wherein a single script (or set of scripts) will be capable of processing multiple requests. This is the model used by, for example, ASP and JSP, and has distinct efficiency advantages (not least because many things, such as determining configuration, compiling scripts, and connecting to databases, need only be done once instead of once per request).

The problem with actually implementing this is that PHP doesn’t make it especially easy. It’s not by any means impossible, but it’s far easier to develop a PHP extension than it is to develop a new PHP SAPI (server interface), if nothing else but for the fact that to build a new SAPI you need to develop it as a patch to the PHP sources.

PHP isn’t without its problems as a whole, though, and I’ve talked about some of them in the past. I like PHP’s syntax and language capabilities (although I think some of them are a little lacking—at least, until PHP 6), but in certain respects the rot has already set in. Java is a reasonable alternative from a language perspective, but it isn’t really comparable once you start to examine it:

  • Applications must be compiled in advance (no edit-refresh-edit-refresh cycle as you get with Perl, Python or PHP scripts)
  • The runtime is monstrous (takes forever to build, and even longer to learn)
  • Java isn’t one component in a typical stack, it generally is the stack: you end up with web servers written in Java, databases written in Java, and so on and so forth.

PHP’s modularity is one of its better aspects, despite the namespace rot. Extensions are small, lightweight, and plug in easily. Writing extensions is comparatively trivial, even for things which do unholy things to the runtime. Building PHP itself takes no more than a couple of minutes on modern hardware, and deploying it really is utterly trivial.

However, that namespace rot is a big deal, and my experience suggests that the focus is still on traditional scripts rather than big framework-driven applications: language facilities which make building frameworks easy (e.g., the Reflection API) are fairly new, don’t have a huge amount of test coverage, and are horribly unstable at times. I’ve lost track of the number of hours I’ve spent debugging strange interactions between call_user_func_array() and ReflectionClass::newInstanceArgs() which often manifest themselves as Segmentation Fault (Core dumped). A scripting language whose interpreter crashes when handed perfectly valid (if perhaps unusual) scripts is suffering from a case of epic fail.

Moreover, thread-safety in PHP is a pretty new addition, and few would consider running mod_php5 with a multi-threaded Apache server in a production environment. Realistically, were I to build an application server, solid thread-safe code is a requirement.

As such, I’ve been looking at alternatives, and of all the various options ECMAscript appears to (perhaps surprisingly) offer the best fit, specifically focussing on SpiderMonkey:

  • The syntax is similar, being traced in the case of both PHP and ECMAscript back to that of C
  • The core ECMAscript language doesn’t have much in the way of a runtime library, making it ripe for building on top of
  • SpiderMonkey is thread-safe
  • Where the APIs can be made similar, the difference between a lump of PHP and a lump of ECMAscript are fairly slight, and mostly boils down to syntactic sugar; this assumes that the original PHP code was written in the style you’d want to write your ECMAscript in, of course
  • ECMAscript has good support for regular expressions
  • ECMAscript has lots of widespread support (Adobe, Apple, Opera and Mozilla are all collectively pushing the boundaries), and all the people who are experienced with Flex and Adobe AIR already know how to write ECMAscript code
  • Embedding SpiderMonkey is completely trivial. A program which creates a runtime, adds a few built-in functions, then loads and executes a script can be written in less than 100 lines of liberally-spaced C. Writing a basic environment for executing something a bit beefier than test scripts, with support for configuration and plug-ins (for things like database APIs, XML parsing, remote connections, and so on) would be fairly easy to wrap up in a couple of thousand lines of code, and the plug-ins wouldn’t be particularly difficult to write either (adding new objects to a SpiderMonkey runtime is straightforward).

SpiderMonkey does have downsides, however. (Partial, for the moment) ECMAscript 4 support, by way of Adobe’s Tamarin engine, won’t appear until SpiderMonkey 2 (slated for inclusion in Firefox 4). This is a long way off, and although it will bring great benefits, we’re stuck with the same kind of hacks employed by scripts in web pages in order to do inheritance of classes and the like until it happens. Tamarin actually works (I don’t know if you could consider it “stable” at the moment) right now, but it doesn’t have its own compiler: it’s a VM for bytecode-compiled scripts, which is why it won’t be until SpiderMonkey 2 that it’ll be integrated into the Mozilla tree properly. You can compile scripts right now that can be executed by Tamarin, but you have to use Adobe’s open source Flex compiler (which is written in Java, curiously) to do it. This is worlds apart from SpiderMonkey’s current JS_CompileFile API, which just works.

The other problem is that, if I write code now against SpiderMonkey 1.7 (the current stable release) with the expectation that the API won’t change significantly when SpiderMonkey 2.0 is released (I have the fact that the Mozilla codebase uses the same API on my side, as updating it would take considerably more work than my paltry effort), and I’m proved wrong… well, it could well be a lot of wasted effort, and there are no guarantees. Finally, is it even worth building a rich pluggable scripting environment when the version of the language you support is, frankly, horrible?

My gut feeling is that it is: I’m going to take a gamble that the SpiderMonkey APIs won’t change significantly, and that we’ll start to see releases of SpiderMonkey pre-2.0 with Tamarin integrated within the next few months (work on the compiler is already underway and the resulting experimental code can be built today). And yes, while ECMAscript as implemented by SpiderMonkey 1.7 doesn’t support classes, packages and inheritance (the big things Adobe included from ECMAscript 4), it’s a rich enough language that it can be useful in its own right (as evidenced by all the interesting things people are doing with it on web pages).

And so, my plan is this: I will create a library which itself embeds SpiderMonkey. This library will be little but a thin veneer, but include the necessary support for certain bits of built-in functionality, for configuration, and module-loading. Most importantly, it will define a server interface (SAPI). Using the SAPI, I’ll build a simple command-line interpreter (not dissimilar to the current js interpreter included with SpiderMonkey). Other SAPIs, such as a CGI interpreter, will follow. Next come the modules: those for database connectivity are probably the most important. By the time all of that’s done, I fully expect SpiderMonkey 2 to be nearing stable release, if it’s not already happened.

In case anybody’s wondering why I’m building stuff on top of ECMAscript and don’t just build stuff in Ruby, Python, or any one of a hundred other scripting languages, there are a few answers:

  • Syntax (I like the C-like syntax of ECMAscript)
  • Why should scripts on the server have to be written in a different language to scripts on the client?
  • It’s fun
  • Why not?

I’m not about to abandon PHP, incidentally; I just can’t see it going where I want it to any time quickly. An awful lot of people out there run PHP 4 still; the impression I’ve got is that the vast majority of people who write programs in PHP have very little experience of programming outside of their work with PHP, and so they are generally limited by what PHP can currently do, rather than being able to see what it could do. You have to wonder what proportion of PHP developers out there have ever used, say, interfaces, or run up against a situation where some form of multiple inheritance might be useful (that’s a whole other post, incidentally), or have been forced to wonder why, in a fairly dynamic language like PHP, the only way to call the constructor of a class whose name is stored in a variable is via the Reflection API.