Last year I wrote my first cloud prediction blog post. I have to be honest, predicting “cloud” is a bit of a daunting task, so this year I'll explicitly focus on a more specific area: Cloud computing in research.
Please keep in mind, that these are the predictions of a polite, yet opinionated person, and not the company's.
First prediction: In 2019, the term Cloud will continue to be used both way too narrowly and way too widely. “Cloud means Kubernetes” and “Cloud means IaaS” are on the narrower spectrum, while European Open Science Cloud is on the wider one.
European Open Science Cloud (EOSC)
Let’s start with a big, visible topic. I can feel the heat of the directors breathing down my neck when I'm writing this section. So, yes, this is a quite political subject. However, EOSC is of course tightly connected with scientific cloud use, and these are my predictions, so let’s get started.
Congratulations to EOSC, which was officially launched last November! For many people, EOSC was some large ephemeral formless entity that does... something, on the political level. EOSC does have a list of actual services. But that’s pretty much it, a list of links. Should the cloud word even be included?
Well, yes and no. It’s understandable that kick-starting something on this scale takes a while, and as a lot of my colleagues can testify, I’m a big fan of “release early”. However, is it a “Cloud” yet?
EOSC needs to solve the basic issues with a federated cloud marketplace. How are resources granted and paid for? How is the reporting done? Are the SLAs common? What AAI systems and principles are there, and can both users and providers integrate to them?
The services must be easily consumable by the users, and there must be clear integration points for the providers. Being a service catalog, and relying on tons of different contact points between vendors and users, where all pairings have different processes, is neither flexible nor fast. The process planning must be done thoroughly, with an eye on automatability. The resource provides must be in focus in this process planning, but they are less important than the end users.
”In 2019, the term Cloud will continue to be used both way too narrowly and way too widely."
Having the framework in place to connect services to user communities is a great goal. The real “C” of EOSC is a cloud (with at least some of the NIST definition of cloud characteristics (on-demand self-service, resource pooling, and measured service) for integrating services, users, and user communities. It’s not a Kubernetes or Nextcloud service for European scientists.
When a researcher who needs scientific IT resources can go to the EOSC site, find a suitable service, figure out the cost scheme for their use, and be able to get to work during the same day, EOSC will be successful.
EOSC is ambitious, and I’m afraid that too big early expectations will be detrimental to it. The hard problems (e.g. making Authorization, costs, contracts, SLAs and reporting trivial for customers and providers) must be solved, but it takes time. If EOSC can deliver good basic rules and tools for federation, with a focus on making it easy for end users, it will be a great step forward. Not only will the researchers benefit, but the providers gain benefits of scale by building services for larger audiences.
Will the “C” in EOSC be there in 2019? I doubt it, at least not a large part of it. Will EOSC be completely useless in 2019? No, but it will only be able to serve some selected use cases. I expect greater benefits to be reaped within 3–5 years, IF there’s active development in a good direction.
FPGAs and scientific code
Accelerators are not a new thing in data centers. Deep learning and cryptocurrency have made the largest waves when it comes to using GPGPUs for acceleration. However, they are not the only codes that benefit from acceleration. More and more other scientific codes are also using GPGPUs for acceleration.
It looks like the next step is FPGAs. Recently FPGAs have become available in commercial IaaS services. Generic accelerator support is also maturing in e.g. OpenStack with the Cyborg service.
FPGAs are often used for accelerating deep learning workloads. However, as with any other type of acceleration, a wide range of computation benefits from the FPGAs. I think we’ll see forays into the FPGA field more and more for scientific computation. Apart from the early adopters, the growth will probably be slow, as it is a new computing paradigm. However, cloud services will provide an easy way to dip your toes into this, for both application developers and users.
Scientific data storage
In many cases, scientific data storage usage still follows old patterns. Copy the data from a laptop/USB disk/lab server to a computing cluster/VM/etc. and compute on it. Copy the results somewhere, maybe back to the laptop, play around with them. Maybe you copy the data somewhere else for visualization, or further processing, and you juggle a few copies and versions, and try not to
mix them up.
These models aren’t really efficient, nor easy to use. The future workflows will revolve much more around the data itself. Either data is directly produced to, or you upload the data to a generally accessible storage, most likely an object storage service.
As the data is accessible from wherever you need, you’ll point the computational platforms to the data, and the data location won't change (except temporary copies for computational purposes) throughout the whole analysis workflow.
As these data services are accessible from anywhere you need to, it makes it easy to combine tools from many different provides, which can poke at the data no matter where they are produced. A lot of tooling still needs to be built, but I expect that tools and processes will become mature and usable.
Again, this change will take time, as it needs changing user behavior. However, the rising demand for FAIR (Findable, Accessible, Interoperable, Reusable) principles for research data will probably accelerate this, since the same models make it easier to at least provide the “A” for FAIR.
In my (anecdotal) experience, the amount of OpenStack installations by scientific infrastructure providers has had a significant growth last year.
The IaaS paradigm has made it easier to manage infrastructure more systematically, both for the customers and providers of the infrastructure. IaaS fills a different need that e.g. HPC clusters which have been traditionally run by scientific computational service providers.
While HPC clusters are somewhat easily usable by end-users, IaaS services provide a more generic infrastructure layer. However, for many scientific OpenStack use-cases (and I’m sure other use-cases too), IaaS is often still seen as an end-product to the users, rather than a generic improvement on infrastructure management.
The OpenStack Summit renamed itself to the Open Infrastructure Summit, as a reflection of the trend that OpenStack’s role is not merely a cloud product to be used, it’s a part of having a software defined infrastructure.
The focus has started moving from “Do we have an IaaS offering?” to “Is our whole IT infra software defined?”. In the latter question OpenStack is a part of the answer, but not the whole answer.
”This will have a big impact on the availability of scientific IT resources, but it will also push OpenStack itself a bit behind the scenes."
I think that many OpenStack installations for scientific use will follow this path. They will no longer be “an OpenStack installation for purpose X” but “Scientific IT resources, usable by X, Y, Z, and our organization is also by the way running our web pages there.”
It will take some time, as it does require quite a high level of maturity from the organization. This will have a big impact on the availability of scientific IT resources, but it will also push OpenStack itself a bit behind the scenes.
That’s not a bad thing, since the services built on top of the open infrastructure are more interesting than the infrastructure itself.
Except of course to cloud-geeks like me.
PICTURE: ADOBE STOCK