From good old batch systems to actor based parallelism
March 27, 2009
I was recently reading on JavaWorld a serie of 2 articles discussing the advantages of actor-based parallelism over traditional approaches involving shared objects and locking mechanisms. Those articles can be found here: part1, part2.
The first article gives a very good introduction of how actor-based parallelism works so I won't spoil the web with a worse introduction of my own brew. Let's just remind us the key properties of the actor model: light-weight threads running in parallel, sharing nothing, but communicating with each other by sending each other immutable messages. Each actor has an input queue of messages to process, called an inbox. That's it. No shared memory, no deadlock-prone and brain hammering IPC mechanisms.
Two things struck me with the actor model.
It's beautiful. The actor model works and is dead simple to use. It only requires to think differently. To me, that is a clear sign of a beautiful design.
The other thing that stroke me is the similarity between some designs used in older batch systems and the actor model.
But let me tell you first a bit of my own history: I have been working quite many years now with a large financial system that was designed and built as a batch system. It is made of a constellation of small and relatively simple programs that communicate with each other by dropping files into each other's inbox. An inbox is just a directory with a specific location. All those programs run one after the other in a specific order and according to a daily schedule. And this schedule is repeated day after day. What we get in the end is like a big state machine that slowly but surely shuffles all of its tickets through various business flows.
This way of designing batch systems is sometimes called 'the inbox model'. It is quite standard and has been used for a long time. And it works very well.
What stroke me is how close this design is to the actor model. Instead of light-weight threads we have stand-alone processes. Instead of messages we have files, but those files are immutable in the same way as the messages passed between actors: they are not modified between the emitter and the receiver. Those stand-alone programs have inboxes very alike those of actors. In the end, the main difference is that the stand-alone processes in a batch system run sequentially while actors run in parallel. Which is where you may think:
"Wait a minute! If the processes in a batch system already implement the same message passing mechanism as actors, why couldn't they be run in parallel?"
And the answer is: they can. Assuming no other information passing mechanism is in the way, you could take such a batch system and make it run in parallel with relatively little modification. This insight gave me a feeling of awe and respect toward the inventors of the inbox model.
Of course, real life is not that simple. Most of the time, the processes of a batch system also communicate with each other via some database. In a way, a shared database realm is just like a shared memory and we are therefore back in the headache of shared-state parallelism. Too bad.
The first article gives a very good introduction of how actor-based parallelism works so I won't spoil the web with a worse introduction of my own brew. Let's just remind us the key properties of the actor model: light-weight threads running in parallel, sharing nothing, but communicating with each other by sending each other immutable messages. Each actor has an input queue of messages to process, called an inbox. That's it. No shared memory, no deadlock-prone and brain hammering IPC mechanisms.
Two things struck me with the actor model.
It's beautiful. The actor model works and is dead simple to use. It only requires to think differently. To me, that is a clear sign of a beautiful design.
The other thing that stroke me is the similarity between some designs used in older batch systems and the actor model.
But let me tell you first a bit of my own history: I have been working quite many years now with a large financial system that was designed and built as a batch system. It is made of a constellation of small and relatively simple programs that communicate with each other by dropping files into each other's inbox. An inbox is just a directory with a specific location. All those programs run one after the other in a specific order and according to a daily schedule. And this schedule is repeated day after day. What we get in the end is like a big state machine that slowly but surely shuffles all of its tickets through various business flows.
This way of designing batch systems is sometimes called 'the inbox model'. It is quite standard and has been used for a long time. And it works very well.
What stroke me is how close this design is to the actor model. Instead of light-weight threads we have stand-alone processes. Instead of messages we have files, but those files are immutable in the same way as the messages passed between actors: they are not modified between the emitter and the receiver. Those stand-alone programs have inboxes very alike those of actors. In the end, the main difference is that the stand-alone processes in a batch system run sequentially while actors run in parallel. Which is where you may think:
"Wait a minute! If the processes in a batch system already implement the same message passing mechanism as actors, why couldn't they be run in parallel?"
And the answer is: they can. Assuming no other information passing mechanism is in the way, you could take such a batch system and make it run in parallel with relatively little modification. This insight gave me a feeling of awe and respect toward the inventors of the inbox model.
Of course, real life is not that simple. Most of the time, the processes of a batch system also communicate with each other via some database. In a way, a shared database realm is just like a shared memory and we are therefore back in the headache of shared-state parallelism. Too bad.