Personal tools
Sections
You are here: Home > Software development > Workflow engines for scientific computing
@CSC Blogs

The bloggers are experts working at CSC - IT Center for Science Ltd. The opinions published on the blogs are their own.

Readers can comment on published blog posts. CSC or its employees are not liable for the content produced by readers.

CSC has the right to remove reader comments if they are considered to be against good morals or against the law or offensive in any other way.

 
Document Actions

Workflow engines for scientific computing

Submitted by Aleksi Kallio posted on 2010-06-10 15:59 last modified 2011-07-11 15:17
What is the difference between workflows and programming languages? Programming languages have proven succesfull. Why do we need workflow engines, because they seem to do basically the same thing?

I participated in a physics simulation "cross training boot camp" in Tuesday 13th of April 2010 at CSC. My presentation (see slides) was about using workflow engines in scientific computing. I have to admit first hand that I am not a huge fan of the workflow paradigm, even though I am also a developer of an analysis platform with workflow capabilities.

Preparing the presentation got me thinking the basic question about workflow engines: What is the difference between workflows and programming languages? Programming languages have proven succesfull, well, at least some of them. Why do we need workflow engines, because they seem to do basically the same thing? Here is a one point of view, though there are also important ones (such as ease of use).

Programming languages (except for some really marginal ones) are designed to be Turing complete. That is a theoretical way to say that everything that can be done with a computer, can be done with any programming language. Of course if the problem at hand fits the language poorly, it can be awkward. But as long as the language is Turing complete, it can be done.

Workflows engines are not usually Turing complete, at least by design. I know that BPEL (the business process orchestration workflow language) is proven Turing complete, but typically, they aren't. Many workflow engines allow conditional statements and iteration, but that's not enough for Turing completeness.

So, generic programming languages are designed to be genuinely more powerful than workflow engines. It is obvious that there are tasks where you need to use a programming language, not a workflow engine. But what about the other way around, if workflow engines are "underpowered" compared to programming languages, what they are useful for?

Expressive power does not come for free. More is not always better here. A common theme in computer science and mathematics is that when you increase the power of your system, it becomes harder to manipulate. For example, there are a lot of nice meta level tricks that you can do to sentences in propositional logic. Predicate logic has more expressive power, but then many of the tricks do not work any more.

So this idea can be carried over to workflow engines. Because workflow description languages are simple, there should be more tricks that can be applied. The obvious one is of course visualisation: most workflow systems are visual and show the structure as boxes and arrows. Such a structure can not be found from typical programming languages, but understanding them needs human work to build the representation mentally, by looking at the source code.

This actually goes very deep: Reasoning about programs often reduces to a so called halting problem, which is not solvable for Turing complete languages in the general case. So you cannot say for sure if an arbitrary program is ever going to stop, but you can do it for many workflow descriptions. The halting problem is mostly a theoritical tool, but it has many practical implications. The limits are much tighter for reasoning about the behaviour of real programming languages than for less expressive workflow languages.

Unfortunately this possibility is not used very widely. Graphical workflows you have, but not much else. Some engines allow automatical parallelisation (once again, close to impossible with programming languages). They could also support monitoring of the workflows, load balancing, distributed data storage (in a cloud, perhaps?), etc.  There are possibilities in the workflow paradigm, but they are not utilised very well.

The message to take home is: If you do workflows, do it with good tools. Because from tooling point of view the workflow paradigm is easier than the generic programming paradigm, the tools available should be significantly more powerful. Unfortunately programming languages tend to win, due to their massively larger user bases and longer history. If you find yourself writing workflow descriptions with a simple text editor, then you are really getting the worst of both worlds.

It is interesting to see if the workflow world ever catches up with programming tool developers in the level of tool support. For workflows to be widely used, there needs to be better tools. And for those tools to emerge, there needs to be more people using workflows. Chicken and egg, I'd say.

Our solution in Chipster has been to integrate the workflow engine with the rest of the application. So even if there is not that much expressive power in the workflow system itself, the integration with a larger data analysis environment makes it powerful. And of course the easy of use is better for workflows, which is important for us.

 

This entry is cross-posted in the following blogs:

Chipster

Nice analysis

Posted by SakariO at 2010-08-16 10:12
I fully agree on the analysis, but there are still few things to notice in workflow engines.

In the typical workflow solutions that are connected to human operations or roles there is need to implement solution for the role and task management. These can mean for example that tasks are escalated to supervisor or assigned task is canceled.

The best workflow engines with human workflow/taks management capabilites simplify the effort of implementation. You can avoid the writing of thousands of lines of code with their features. If you have capabilities to use your product.

The biggest challenge of today are workflow products since there are huge differences in their maturity and they will require product specific knowledge.

Human interaction can be important

Posted by Aleksi Kallio at 2010-08-18 13:27
You're right, integrating humans into workflow processes is something that I did not discuss. That is quite rare in scientific use, unlike in business. Some scientific workflow engines (e.g., Taverna) have some support for human interaction, but I believe it is very simple compared to commercial products.

If human interaction is central and if the workflow engine supports it well, then that is a major plus for the workflow solution.