If you’re one of those hipster programmers who loves Clojure, Ruby,
Scala, Erlang, or whatever, you probably deeply loathe Java and all of
its giant configuration files and bloated APIs of AbstractFactoryFactoryInterfaces
.
I used to hate all that stuff too. But you know what? After working for
all these months on these huge pieces of Twitter infrastructure I’ve
started to love the AbstractFactoryFactories
.
Let me explain why. Consider this little Scala program. It uses “futuresâ€, which are a way to schedule computation to be done in parallel from the main flow of a program. They are sometimes a natural way of modeling the most efficient scheduling of program execution. Usually you schedule in advance some expensive work that can be done in parallel and then you do something else in the meantime. Only when you really need the result of the original computation do you block and wait (and hopefully only very briefly since you scheduled the work way in advance!). Here is a “typical†Java-ish Futures library used from Scala:
private val executor = new ThreadPoolExecutor( poolSize, maxPoolSize, keepAlive.inSeconds, TimeUnit.SECONDS, new LinkedBlockingQueue[Runnable], new NamedPoolThreadFactory(name)) val future = new FutureTask { doSomeWork } executor.execute(future)
If you come from a dynamic language like Ruby or Python you will probably have a visceral reaction like “Yeck! Look at all that horrible boilerplate. Convention over configuration!†Wouldn’t it be nice if you could just do something like:
val future = new Future { doSomeWork }
It seems nice but its nicety is just an illusion. All that boilerplate is really important when you work at massive scale and where efficiency really matters. These magic numbers like the thread pool size and the kind of queue you use to schedule work can vastly impact the performance of your application. And the “right†configuration depends entirely on the nature of the problem you’re solving and how callers of this code behave. What all of this weird boilerplate provides is a way to configure the behavior of the system; it doesn’t assume there’s one right way of doing things. And that is precisely how modular software behaves: modular code is code designed to grow past the assumptions of just one user. Modularity really matters when your software isn’t a little throw-away program.
Twitter recently open-sourced Querulous,
a minimal database querying library for Scala. We use it in several
projects in Twitter, but it was designed principally to meet the extreme
demands FlockDB
, our distributed, fault-tolerant graph database. FlockDB
demands extremely low-latency (sub millisecond) response times for
individual queries. Any excessive indirection from an ORM would be
unacceptable. Furthermore, because FlockDB
processes tens of thousands of queries per second across dozens of shards, FlockDB
must collect extensive statistics on the performance and health of the
various shards in order to direct traffic to the most efficient place.
So Querulous
was designed for querying databases at low
latency, massive scale, and with easy operability. It has flexible
timeouts, extensive logging, and rich statistics. But as FlockDB
became more mature and sophisticated, the demands grew greater. We
needed different health-check and timeout strategies in different
contexts. It became clear that Querulous
would need to be made extremely modular and extremely configurable to work at all.
So we set about to re-write Querulous
using my favorite
modularity techniques: Dependency Injection, Factories, and Decorators.
In other words, everything you hate about Java.
The design patterns of modularity
In order for code to be modular it must have few hard-coded
assumptions. In Object-Oriented software this means something very
particular since the essence of an Object-Oriented program is that its
structure is organized around the types of objects. Therefore, the most
fundamental, anti-modular assumption in Object-Oriented software is the concrete type of objects. Any time you write new MyClass
in your code you’ve hardcoded an assumption about the concrete class of
the object you’re allocating. This makes it impossible, for example,
for someone to later add logging around method invocations of that
object, or timeouts, or whatever isn’t anticipated a priori.
In a very dynamic language like Ruby, open classes and method aliasing (e.g., alias_method_chain
)
mitigate this problem, but they don’t solve it. If you manipulate a
class to add logging, all instances of that class will have logging; you
can’t take a surgical approach and say “just objects instantiated in
this contextâ€. (Update: some people are asking “what about
metaclasses? Metaclasses do not solve this problem at all because if you
do not have control over the caller of Foo.new
then you
cannot later add new behavior to the metaclass; it has to be hardcoded
at the site of manufacture. The point of this technique is to avoid
knowing in advance what behavior you will add in, to make it
configurable!)
There are standard design patterns to mitigate this, namely Dependency Injection, Factories, and Decorators. By injecting a Factory (a function that manufactures objects) as a parameter to a function that needs to create objects, you allow a programmer to later change his mind about what Factory to inject; and this means the programmer can change the concrete types of objects as his heart desires. And by using Decorators, the programmer can mix and match functionality easily, stack one thing on top of another like so many legos. Let’s look at an example.
Here I have a Query
object, with methods like #execute()
. I want to add timeouts around all queries. I start by creating a QueryProxy
that routes all method invocations through an over-ridable method: #delegate
:
abstract class QueryProxy(query: Query) extends Query { def select[A](f: ResultSet => A) = delegate(query.select(f)) def execute() = delegate(query.execute()) def cancel() = query.cancel() protected def delegate[A](f: => A) = f }
Then, to implement timeouts, I create a Query
Decorator:
class TimingOutQuery(timeout: Duration, query: Query) extends QueryProxy(query) { override def delegate[A](f: => A) = { try { Timeout(timeout) { f } { cancel() } } catch { case e: TimeoutException => throw new SqlTimeoutException } } }
This Decorator delegates to the underlying query object the execution of the query, but it wraps that execution in a Timeout
.
As an aside, it is interesting to note that the Decorator pattern is
just the Object-Oriented equivalent of function composition in a
functional language. Scala makes this especially explicit since
everything is both an Object and a Function (it is a function if it is
an object that responds to the method #apply()
). A Decorator around an object that only implements #apply()
is pure Function-composition as you would see in Haskell, ML, and so forth. I might phrase this as: function composition is a degenerate case of the Decorator pattern.
The implementation of the Timeout
function is shown for the curious. It uses threads and is weird but cool.
object Timeout { val timer = new Timer("Timer thread", true) def apply[T](timeout: Duration)(f: => T)(onTimeout: => Unit): T = { @volatile var cancelled = false val task = if (timeout.inMillis > 0) Some(schedule(timeout, { cancelled = true; onTimeout })) else None try { f } finally { task map { t => t.cancel() timer.purge() } if (cancelled) throw new TimeoutException } } private def schedule(timeout: Duration, f: => Unit) = { val task = new TimerTask() { override def run() { f } } timer.schedule(task, timeout.inMillis) task } }
(An alternative implementation of Timeout
could use Futures, but that’s a subject for another blog post)
Modularity and testing techniques
One of the principal advantages of (or stated another way, one of the
principal motivations for) writing Decorator-oriented code is how easy
it is to write isolated unit tests of that code. To test the timeout
functionality of the TimingOutQuery
we don’t need to interact with a database at all. We can write behavioral/mockish tests like this:
val latch = new CountDownLatch(1) val query = new FakeQuery(List(resultSet)) { override def cancel() = { latch.countDown() } override def select[A](f: ResultSet => A) = { latch.await(2.second.inMillis, TimeUnit.MILLISECONDS) super.select(f) } } val timingOutQuery = new TimingOutQuery(query, timeout) timingOutQuery.select { r => 1 } must throwA[SqlTimeoutException] latch.getCount mustEqual 0
If the timeout functionality was just inlined into the #select()
method of the source code of the Query
class, or “bolted on†as an alias_method_chain
in Ruby (or added as “advice†in some AOP shit) you could not write
this test without talking to the database and somehow finding a query
that takes long enough that it will actually hit the timeout. Because we
instead use Decorators, to test the code we can use a fake query that
implements the Query
interface but that doesn’t talk to the database at all. Here we use a CountDownLatch
to “halt†execution for a bounded amount of time, thus triggering the timeout.
Tying it together with Factories
Back to our original mission. So now we have a way of layering on timeout functionality on top of a Query
object. But how do we ensure that Timeouts get used when we want them
to? The thing that glues this all together is to make sure that
everybody that needs to instantiate a Query object never ever calls new Query
directly. We provide instead a Factory as a parameter to the method
that needs to manufacture the object. The programmer chooses which
Factory to provide at runtime. Here is a Factory that makes TimingOutQueries
:
class TimingOutQueryFactory(queryFactory: QueryFactory, timeout: Duration) extends QueryFactory { def apply(connection: Connection, query: String, params: Any*) = { new TimingOutQuery(queryFactory(connection, query, params: _*), timeout) } }
Since TimingOutQueries
are decorators around regular Queries
, to manufacture a TimingOutQuery
you have to first manufacture a regular Query
. In this example, the TimingOutQueryFactory
takes another Factory as an argument. This could be a simple QueryFactory
or something more complex–allowing Factories to be composed
indefinitely. With this we stack together timeouts, logging, statistics
gathering, and debugging like so many pieces of legos. This smacks of
the oft-ridiculed Java AbstractFactoryFactoryInterface
. But let me put it bluntly: AbstractFactoryFactoryInterface's
are how you write real, modular software–not little fart applications.
This seems like a bit of a mind-fuck because we here have Factory
Decorators that take Decorated Factories that make Decorated Queries.
It’s so meta! (Actually, “meta†in Greek means nothing like “meta†in
English. “Meta†plus the accusative means “after†so Aristotle’s
Metaphysics is actually just a book “after [the book on] physicsâ€.
Anyway.) So all these crazy FactoryFactoryDecorators
sound
kind-of scary at first but it is just the kind of abstraction on top of
abstraction and closure under composition that allows complex software
to be made simple. Manage complexity by taking many things and
re-conceiving of them as just one thing; this one thing is then combined
with many other things and the process is repeated up the ladder of
abstraction until you reach the Godhead.
Taking this to the next level
To take this even further, let’s add a new feature: per-query timeouts. At one point in the history of FlockDB
,
there was a global 3-second timeout. This was really stupid given that
our most common query has a latency of 0.5ms and a standard deviation of
2ms. If you have a global timeout you must set your timeout around your
most expensive query not your most common query (otherwise, your most
expensive query will always timeout!). But for a production system,
cheap frequent queries, if they start exceeding 2 standard deviations,
can take down your site. So a sensible timeout for these frequent
queries is like 5ms. But we had it set to 3,000 ms!! Yikes. So let’s
change it!
class PerQueryTimingOutQueryFactory(queryFactory: QueryFactory, timeouts: Map[String, Duration]) extends QueryFactory { def apply(connection: Connection, query: String, params: Any*) = { new TimingOutQuery(queryFactory(connection, query, params: _*), timeouts(query)) // YAY } }
That’s it. We’ve now implemented a new Timeout
strategy in one line of code! And to wire it all together it is a piece of cake! Querulous
makes no assumptions about how best to implement a timing-out strategy,
it doesn’t even assume you’ll want timeouts (in fact, there are some
cases you don’t want any timeouts). Querulous
achieves modularity by providing an “injection point†for the programmer to layer on custom functionality. It takes QueryFactories
as a parameter to the method, which can return arbitrarily decorated Queries
.
I love this example because it’s so simple but yet it’s no toy. It
also emphasizes the value of Dependency Injection more generally than
just with Factories. We could have written the TimingOutQuery
with a static global constant (probably the most common programming technique):
class TimingOutQuery(query: Query) extends QueryProxy(query) { val TIMEOUT = 3.seconds
But intead it is injected as a parameter to the constructor to the TimingOutQuery
:
class TimingOutQueryFactory(queryFactory: QueryFactory, timeout: Duration) extends QueryFactory { def apply(connection: Connection, query: String, params: Any*) = { new TimingOutQuery(queryFactory(connection, query, params: _*), timeout) } }
This enables the TimingOutQueryFactory
to invoke a
function to choose the appropriate timeout for this query. In this case,
we just look some shit up in a hash table (timeouts(query)
) and we’re done.
Yes, all this FactoryFactory
bullshit is exactly what
you hate about Java. But it’s amazing not how just short this code is
but that it could be configured by any programmer anywhere, regardless
of whether they have access to the source code that actually
instantiates and executes queries. Any user of Querulous
can decide if she want timeouts or not, and she can decide if they also want debugging, stats gathering, and so forth–Querulous
hard-codes no assumptions. So, yay modularity.
Source:http://magicscalingsprinkles.wordpress.com/2010/02/08/why-i-love-everything-you-hate-about-java/