Workflow engines for scientific computing
I participated in a physics simulation "cross training boot camp" in Tuesday 13th of April 2010 at CSC. My presentation (see slides) was about using workflow engines in scientific computing. I have to admit first hand that I am not a huge fan of the workflow paradigm, even though I am also a developer of an analysis platform with workflow capabilities.
Preparing the presentation got me thinking the basic question about workflow engines: What is the difference between workflows and programming languages? Programming languages have proven succesfull, well, at least some of them. Why do we need workflow engines, because they seem to do basically the same thing? Here is a one point of view, though there are also important ones (such as ease of use).
Programming languages (except for some really marginal ones) are designed to be Turing complete. That is a theoretical way to say that everything that can be done with a computer, can be done with any programming language. Of course if the problem at hand fits the language poorly, it can be awkward. But as long as the language is Turing complete, it can be done.
Workflows engines are not usually Turing complete, at least by design. I know that BPEL (the business process orchestration workflow language) is proven Turing complete, but typically, they aren't. Many workflow engines allow conditional statements and iteration, but that's not enough for Turing completeness.
So, generic programming languages are designed to be genuinely more powerful than workflow engines. It is obvious that there are tasks where you need to use a programming language, not a workflow engine. But what about the other way around, if workflow engines are "underpowered" compared to programming languages, what they are useful for?
Expressive power does not come for free. More is not always better here. A common theme in computer science and mathematics is that when you increase the power of your system, it becomes harder to manipulate. For example, there are a lot of nice meta level tricks that you can do to sentences in propositional logic. Predicate logic has more expressive power, but then many of the tricks do not work any more.
So this idea can be carried over to workflow engines. Because workflow description languages are simple, there should be more tricks that can be applied. The obvious one is of course visualisation: most workflow systems are visual and show the structure as boxes and arrows. Such a structure can not be found from typical programming languages, but understanding them needs human work to build the representation mentally, by looking at the source code.
This actually goes very deep: Reasoning about programs often reduces to a so called halting problem, which is not solvable for Turing complete languages in the general case. So you cannot say for sure if an arbitrary program is ever going to stop, but you can do it for many workflow descriptions. The halting problem is mostly a theoritical tool, but it has many practical implications. The limits are much tighter for reasoning about the behaviour of real programming languages than for less expressive workflow languages.
Unfortunately this possibility is not used very widely. Graphical workflows you have, but not much else. Some engines allow automatical parallelisation (once again, close to impossible with programming languages). They could also support monitoring of the workflows, load balancing, distributed data storage (in a cloud, perhaps?), etc. There are possibilities in the workflow paradigm, but they are not utilised very well.
The message to take home is: If you do workflows, do it with good tools. Because from tooling point of view the workflow paradigm is easier than the generic programming paradigm, the tools available should be significantly more powerful. Unfortunately programming languages tend to win, due to their massively larger user bases and longer history. If you find yourself writing workflow descriptions with a simple text editor, then you are really getting the worst of both worlds.
It is interesting to see if the workflow world ever catches up with programming tool developers in the level of tool support. For workflows to be widely used, there needs to be better tools. And for those tools to emerge, there needs to be more people using workflows. Chicken and egg, I'd say.
Our solution in Chipster has been to integrate the workflow engine with the rest of the application. So even if there is not that much expressive power in the workflow system itself, the integration with a larger data analysis environment makes it powerful. And of course the easy of use is better for workflows, which is important for us.