Sharing enables screening 16 million structures in 7 minutes

Screenshot from Schrödinger drug discovery suite.

Sharing enables screening 16 million structures in 7 minutes

In silico research is integral in drug discovery including e.g. simulations and data driven methods. Bio and life science researchers are the biggest user group in CSC’s clientele. Easily accessible graphical user interfaces make the latest tools available for an ever-larger user community and large storage services facilitate processing and maintaining data. 

The benefits of collaboration in this field have recently been highlighted in the joint COVID-19 projects. In fact, the work described below is a part of Repurposing drugs for COVID project funded by the Academy of Finland. Access to raw or preprocessed data can significantly speed up research and in the following we will give one such example from the University of Eastern Finland, conducted by Dr. Tuomo Laitinen and his team. 

Schrödinger Shape offers faster screening

In early phases of drug design there’s often need to quickly screen a large number of molecules to narrow down the ones to look more closely. These scans are often based on the molecular 3D-geometry. A large molecular library is scanned to look for similar molecules as the one(s) that exhibit the desired effect. Traditionally, this has been done using methods like molecular docking or pharmacophore searches. Schrödinger has developed a new algorithm which can exploit a GPU to perform a similar geometry matching resulting in a significant speedup. 

The Schrödinger drug discovery suite, which is available for academic research for students and staff in all Finnish universities, offers a tool called Shape for this. 

– A small molecule library can be easily processed via the graphical user interface on your local laptop, but if you want to screen 6 million molecules, you want to use the CSC supercomputer. Doing this efficiently is a little bit more involved, but definitely was worth it, summarizes Laitinen.

It turned out, that once the large – and generally useful Molport  – library has been preprocessed to be used in the final screening step with Phase, new kinds of molecules, interesting to other researchers, can be processed very quickly with GPUs.

– The library preprosessing steps took quite a bit of work. Sorting out the correct flags and other syntactic details were nontrivial for a very large database and 3 day maximum runtimes. The complete library was split to appropriately sized batches for processing in the Puhti supercomputer. The actual number crunching took almost two weeks in CPU time, says Arun Tonduru from the UEF team. 

The heavy lifting was continued by Dr. Laitinen to combine the results to a single file. 

– Finally, once the shape file was ready, the actual screening against the 16 million structures (or 160 million conformations) on a GPU took only 7 minutes, marvels Laitinen.

FAIR sharing is encouraged

Since the original library is already in the public domain, preprocessing it does not reveal intellectual secrets and can - and should be - made available to other researchers according to the FAIR principles. Since the last actual screening step is easy, but the preparation of the processed shape file is more complicated, requires many times more computational resources and the result takes up 139 GB of disk space, sharing such files is encouraged!

This, and other datasets can be found and reused at Puhti:

In case you want to screen your molecule against this dataset, follow this detailed tutorial on how to do it in Puhti.

The Authors

Professor Antti Poso, Dr. Tuomo Laitinen and Dr. Arun Tonduru are computational drug discovery researchers at the University of Eastern Finland and Atte Sillanpää is a Development Manager at CSC with chemistry background.

More about this topic » Go to insights and news »

Antti Poso, Tuomo Laitinen, Arun Tonduru and Atte Sillanpää