TL;DR
The Mozilla Platform keeps improving: JavaScript native file management is an undergoing work to provide a high-performance JavaScript-friendly API to manipulate the file system.
The Mozilla Platform, JavaScript and Files
The Mozilla Platform is the application development framework behind Firefox, Thunderbird, Instantbird, Camino, Songbird and a number of other applications.
While the performance-critical components of the Mozilla Platform are developed in C/C++, an increasing number of components and add-ons are implemented in pure JavaScript. While JavaScript cannot hope to match the speed or robustness of C++ yet, the richness and dynamism of the language permit the creation of extremely flexible and developer-friendly APIs, as well as quick prototyping and concise implementation of complex algorithms without the fear of memory errors and with features such as higher-level programming, asynchronous programming and now clean and efficient multi-threading. If you combine this with the impressive speed-ups experienced by JavaScript in the recent years, it is easy to understand why the language has become a key element in the current effort to make the Mozilla Platform and its add-ons faster and more responsive at all levels.
Many improvements to the JavaScript platform are pushing the boundary of what can be done in JavaScript. Core Modules, strict mode and the let construct are powerful tools that empower developers to produce reusable, clean and safe JavaScript libraries. The Mozilla Platform offers XPConnect and now js-ctypes, two extremely powerful technologies that let privileged JavaScript maskerade as C/C++ and get access to the low-level features of the platform. Other technologies such as the Web Workers expose low-level operating system features through fast, JavaScript-friendly APIs (note that the Mozilla Platform has exposed threads and processes to JavaScript at least since 2005 – Web Workers are faster, nicer, and play much more nicely with the runtime, in particular with respect to garbage-collection and the memory model).
Today, I would like to introduce one such improvement: native file management for JavaScript, also known as OS.File
.
Since JavaScript has become a key component to the Mozilla Platform, the Mozilla Platform needs a great library for manipulating files in JavaScript. While both XPConnect and JS-ctypes can (and have been) used for this purpose, our objective, with this library, is to go way beyond the file management APIs that has been exposed to JavaScript so far, regardless of the platform, in terms of:
- expressiveness;
- integration with the JavaScript side of the Mozilla Platform;
- operating system-level features;
- performance;
- extensibility.
This library is a work in progress by the Mozilla Performance Team, and we have good hope that a fully working prototype will be available by early January. Not everything is implemented yet and all sorts of adjustments can yet be made based on your feedback.
Once we have delivered, it is our hope that you will use this library for your future works on the Mozilla Platform, whether you are extending the Mozilla Platform, developing an add-on or an application, or refactoring some existing feature.
Let me emphasize that this is a Mozilla Platform API (hence the “OS†prefix), not a Web API. By opposition to the HTML5 File object, this API gives full access to the system, without any security limitation, and is definitely not meant to be scriptable by web applications, under any circumstance.
Manipulating files, the JavaScript way
Reading from a file
Let us start with something simple: reading from a file.
First, open the library:
1 | Components.utils.import( "resource://gre/modules/osfile.jsm" ); |
OS.File
is a JavaScript
module, in other words it is shared between all users in the same
thread. This is particularly important for speed, as this gives us the
ability to perform aggressive caching of certain data.
Once you have opened the module, you may read your file:
1 | var fileName = "/home/yoric/hello" ; |
2 | var contents = OS.File.openForReading.using(fileName, function (myFile) { |
3 | return myFile.readString() |
4 | }); |
This extract:
- opens file
"/home/yoric/hello"
for reading; - reads the contents of the file as a string (assuming ASCII encoding);
- closes the file;
- reports an error if anything wrong has happened either during opening or during reading;
- places the result in variable
contents
.
This short listing already demonstrates a few interesting elements of the API. Firstly, notice the use of function using
.
This function performs scope-bound resource management to ensure that
the file is properly closed once it has become unneeded, even in
presence of errors. This has roughly the same role as a finally
block in Java or a destructor on a C++ auto-pointer. I will return to
the topic of resource management later. For the moment, suffices to say
that closing a file through using
or method close is optional but recommended, as open files are a limited resource on all operating systems.
Had we decided to entrust JavaScript to close the file by itself at some point in the future, we could have simply written:
1 | var fileName = "/home/yoric/hello" ; |
2 | var contents = OS.File.openForReading(fileName).readString(); |
Secondly, consider OS.File.openForReading
.
As its name suggests, this function/object serves to open an existing
file for reading, and it fails if the file does not exist yet. The API
provides such functions for all common scenarios, all of which accept
optional flags to customize Unix-style file rights, Windows-style
sharing properties and other Unix- or Windows-style attributes.
Alternatively, function/object/constructor OS.File
is the general manner of controlling all details of file opening.
The extracts above do not demonstrate any feature that could not have been achieved with XPConnect. However, let us briefly compare our extracts with an XPConnect-based implementation using similar lines:
- the
OS.File
implementation consists in 2 to 4 lines, including resource cleanup and error-handling / a comparable XPConnect-based implementation requires about 30 lines; - the
OS.File
implementation works both in the main thread or in a background thread / a comparable XPConnect-based implementation works only in the main thread; - benchmarks are not available yet, but I have hope that the
OS.File
implementation should be slightly faster due to a lower overhead and an optimized implementation of readString; - in case of error, the
OS.File
implementation raises an exception with constructorOS.File.Error
/ the XPConnect-based implementation raises a generic XPConnect exception; - if the file does not exist, the
OS.File
implementation raises an error while executingOS.File.openForReading
/ the XPConnect-based implementation raises an error later in the process; - if executed on the main thread, the
OS.File
implementation will print a warning.
Note that OS.File
manipulates this
and closures in the JavaScript fashion, which makes it possible to make our extracts even more concise, as follows:
1 | var fileName = "/home/yoric/hello" ; |
2 | var contents = OS.File.openForReading.using(fileName, function () { |
3 | return this .readString(); |
4 | }); |
or, equivalently,
1 | var fileName = "/home/yoric/hello" ; |
2 | var contents = OS.File.openForReading.using(fileName, |
3 | OS.File.prototype.readString); |
Of course, OS.File
is not limited to strings. Indeed, to return a typed array, simply replace readString
with readBuffer
. For better performance, it is also possible to reuse an existing buffer. This is done by replacing readBuffer
with readTo
.
Also, OS.File
is not limited
to reading entire files. Indeed, all read/write functions accept an
optional argument that may be used to determine a subset of the file
that must be read:
1 | var fileName = "/home/yoric/hello" ; |
2 | var contents = OS.File.openForReading.using(fileName, |
3 | {fileOffset: 10, bytes: 100}, |
4 | OS.File.prototype.readString); |
Well-known directories
The operations we have demonstrated so far use an hard-coded path “/home/yoric/helloâ€. This is not a very good idea, as this path is valid only under Linux, but not under Windows or MacOS. Therefore, we certainly prefer asking the Mozilla Platform to select the path for us. For this purpose, we may replace the first line with:
1 | var fileName = OS.Path.home.get( "hello" ); |
This extract:
- uses global object
OS.Path
(part of libraryOS.File
); - requests the path to the user’s home directory;
- requests item
"hello"
at this path.
The extract demonstrates a few things. Firstly, the use of OS.Path
. This object contains paths to well-known directories, and can be extended with new directories. Each path has constructor OS.Path
, and supports a method get that serves to enter into files/directories. Secondly, the use of OS.Path
as a path for functions of module OS.File
: any function of this module accepts an OS.Path
in place of a hard-coded directory.
Note that OS.Path
objects are purely in-memory constructs. Building an OS.Path
does not cause any call to the file system.
As previously, something similar is feasible with XPConnect. Comparing with a XPConnect-based implementation, we may notice that:
- the
OS.File
implementation consists in 1 line / a comparable XPConnect-based implementation consists in 1 to 4 lines, depending on the use of additional libraries; - the
OS.File
implementation works both in the main thread and in a background thread / again, XPConnect works only in the main thread; - benchmarks are not available yet, but I have hope that the
OS.File
implementation should be slightly faster due to a lower overhead and use of caching.
Behaving nicely
The operations we have demonstrated so far are synchronous. This is probably not problematic for file opening, but reading a large file synchronously from the main thread is a very bad idea, as it will freeze the user interface until completed. It is therefore a good idea to either send the operation to a background thread or to ensure that reading takes place by small chunks.
OS.File
supports both
scenarios by integrating with (work-in-progress) libraries Promise and
Schedule, both of which will be introduced in another post, once their
API has stabilized.
The first step to reading asynchronously is to open library Promise. We will take the opportunity to open Schedule
1 | Components.utils.import( "resource://gre/modules/promise.jsm" ); |
2 | Components.utils.import( "resource://gre/modules/schedule.jsm" ); |
Now that the module is open, we may use asynchronous reading and asynchronous writing functions:
1 | var promisedContents = OS.File.openForReading(fileName). |
2 | readString.async(); |
This operation schedules progressive
reading of the file and immediately returns. Note that we do not close
the file, as this would stop reading, probably before the operation is
complete. The result of the operation, promisedContents
, is a Promise, i.e. a variable that will eventually contain a value, and that may be observed or polled, as follows:
1 | promisedContents.onsuccess( function (contents) { |
2 | console.log( "It worked" , contents); |
3 | }); |
4 | promisedContents.onerror( function (error) { |
5 | console.log( "It failed" , error); |
6 | }); |
Similarly, reading from a background thread is a simple operation:
1 | var promisedContents = Schedule.bg( function () { |
2 | importScripts( "resource://gre/modules/osfile.jsm" ); |
3 | var fileName = "/home/yoric/hello" ; |
4 | return OS.File.openForReading.using(fileName, function (myFile) { |
5 | return myFile.readAsString(); |
6 | }); |
7 | ); |
The call to Schedule.bg
“simply†sends a task to a background thread and ensures that any
result, error, etc. is routed back to the promise. The promised value
itself is used exactly as in the previous example.
Once again, we may compare to the XPConnect-based implementation;
OS.File
-based implementation of asynchronous reading takes 3 lines including opening, closing, resource management / general XPConnect-based implementation of asynchronous reading takes about 10-15 lines, although reading from a hard-coded path or a resource inside the Mozilla Platform can be reduced to 5-6 lines;OS.File
implementation of background reading takes 5 lines / XPConnect does not expose sufficient features to permit permit background, although such features could certainly be implemented in C++ and exposed through XPConnect;OS.File
-based implementation only works for files / XPConnect-based implementation works for just about any construction;- benchmarks are not available, but I have hope that the
OS.File
implementation should be faster than the XPConnect-based implementation due to a less generic implementation and a lower overhead; - the promises used in the
OS.File
-based implementation encourages writing code in natural order, in which the code that uses a value appears after the code that fetches the value / XPConnect-based implementation encourages backwards coding, in which the function that uses a value appears before the code that fetches the value (aka “asynchronous spaghetti programmingâ€).
API summary
The API defines the following constructors:
OS.File
– all operations upon an open file, including reading, writing, accessing or altering information, flushing, closing the file;OS.Dir
– all operations upon an open directory, including listing its contents, walking through the directory, opening an item of the directory, removing an item of the directory;OS.Path
– all operations on paths which do not involve opening a directory, including concatenation, climbing up and down the tree ;OS.File.Error
– all file-system related errors.
and the following global objects:
OS.File
– opening a file, with or without auto-cleanup;OS.Dir
– opening a directory;OS.Path
– well-known directories and files.
Speed
Writing fast, cross-platform, file
manipulation code is a complex task. Indeed, some platforms accelerate
opening a file from a directory (e.g. Linux), while other platforms do
not have such operations (e.g. MacOS, Windows). Some platforms let
applications collect all information regarding a file with a single
system call (Unix), while others spread the work through several system
calls (Windows). The amount of information that may be obtained upon a
file without having to perform additional system calls varies from OS to
OS, as well as the maximal length of a path (e.g. under Windows, the
value of MAX_PATH
is false), etc.
The design of OS.File takes this into
account, as well as the experience from the previous generations of file
manipulation APIs in the Mozilla Platform (prfile
and nsIFile
/nsILocalFile
),
and works hard to minimize the number of system calls required for each
operation, and to let experts fine-tune their code for performance.
While benchmarking is not available yet, we have good hope that this
will make it possible to write IO code that runs much faster, in
particular on platforms with slow file systems (e.g. Android).
In addition, although this should have a much smaller impact, OS.File uses as bridge between C++ and JavaScript the JSAPI, which is, at the moment of this writing, the fastest C++-to-JavaScript bridge on the Mozilla Platform.
Responsiveness
Speed is not sufficient to ensure responsiveness. For this purpose, long-running operations are provided with asynchronous variants that divide the work in smaller chunks to avoid freezing up the thread. The API does not enforce the use of these asynchronous variants, as experience shows that such a drastic choice is sometimes too constraining for progressive refactoring of synchronous code towards better asynchronicity.
Every operation can be backgrounded thanks to the Schedule module. At the time of this writing, it is not possible to send a file from a thread to another one, but we have a pretty clear idea of how we can do this, so this should become possible at some point in the future.
What now?
As mentioned, this is a work in progress. I am currently hard at work on building a complete prototype by the end of December, with the hope of landing something soon afterwards. I expect that benchmarking will continue after this stage to fine-tune some low-level choices and improve the API. If you wish to follow progress – or vote for this feature – we have a Bugzilla tracking bug on the topic, and a whole host of subbugs.
Note that this API will not replace nsIFile
, although once it has landed, some of our JavaScript code will progressively from nsIFile
to OS.File
.
If you have any feedback, now is a great time to send it. Would you use this API? Would you need certain specific or obscure feature that is currently missing in the Mozilla Platform or that risks being lost?
In future posts, I will introduce further examples and detail some of the choices that we have made to ensure the best possible speed on all platforms.
Stay tuned!