Mateusz Bieniek's blog

Molecular Dynamics, Academia and others

Amazon GPU services – estimate your costs

I was just reading about a benchmark of  Google’s TPUv2 with the recently released Nvidia’s V100 in the field of machine learning. The authors used the the Nvidia’s GPU from the Amazon cloud! This is an interesting concept as the computation is increasingly done in “the cloud” or in specially designed HPC clusters. How expensive is it? $8.4 per hour for 4 GPUs V100, which if used for one month without a break would be around six thousands dollars.

This cost seems prohibitive at first but the cost of setting everything up, the initial investment, possible access to cheaper GPUs could mean that it pays off well, not mentioning that it is Amazon’s business to update the GPUs regularly. The direct cost could also have a small advantage – it would force researchers to think more deeply which simulations they should run and which should not – dilemmas which currently are mostly ignored.

Another important factor to consider is the very fast connection, such as infiniband, for the cluster nodes. My own tests have shown that a single GPU can replace around 4 best CPU nodes. The 4 nodes by themselves are terribly expensive, but the networking costs I believe are equally huge. And one could go further and use two GPUs on a single motherboard. All the money and resources thrown at the networking and space and connection might simply be not necessary. Having access to 2 GPUs such as V100 or a lower/cheaper GPU could be the right way to go.

And of course, for 6 thousand dollars, most people would prefer to buy their own GPUs – with the gaming series GTX being the best moneywise investment. And to estimate how useful such investment would be for your own system, it might be worth spending a few dollars and testing the system on Amazon GPUs.

Loading every n-th frame from a large trajectory – VMD, PyMOL and MDAnalysis

Both VMD and PyMOL take ages to load a large trajectory at a large interval.

I have a 100GB trajectory of simulated 2 microseconds time recorded  at every 10 picoseconds. Every 1000th frame should give me 200 frames with a 10ns timestep.

PyMOL with the load_traj tells you which frame is being scanned, but it freezes the user interface completely. Finally after 30 minutes the command line frame counter reached the end of the trajectory. That’s when my computer started slowing down. I opened ‘htop’ and could see that the PyMOL was taking more and more memory – finishing at slightly over 8GB of RAM memory. Apparently, PyMOL only started loading the frames into memory after iterating over all of them – which is difficult to justify. Furthermore, it is difficult to imagine how loading 200 frames can take PyMOL to eat 8GB of RAM memory. Even using the trick for the large trajectories, “set defer_builds_mode, 3”, does not help.

How does VMD perform? The interface is almost frozen during the process but you can see the small redraws each time a new frame is loaded. Loading the trajectory’s every 1000th frame is equally slow. The memory using during loading is very low and a single CPU is sweating to the fullest, which I guess means that the trajectory is being uncompressed. This time there was no unpleasant suprises with the RAM memory usage but the process of loading the trajectory also took around 30 minutes.

What if we could extract the frames with some other software? Gromacs tools behave equally slowly. However, MDAnalysis is doing something magical:

u = MDAnalysis.Universe('npt2us_protCent_pbcMol.gro', 'npt2us_protCent_pbcMol.xtc')
with MDAnalysis.Writer("npt2us_protCent_pbcMol_10nsStep.xtc", u.atoms.n_atoms) as W:
    for ts in u.trajectory[::1000]: #every 10ns

And this takes just about 20 seconds to extract every 1000th frame and save it into another compressed .xtc file. How is this possible that the other, more widely accepted packages do this so inefficiently?

Now that I have the 200 frames extracted I can check if PyMOL really needs 8GB of RAM memory to load it. And yes it does. For the record, the 200 frames occupy 93MB on the drive, and equally little when VMD loads it. In addition, PyMOL still takes a lot of time to load the 200 frames. Whatever data structures it uses underneath, the user’s comfort is not the priority here.



MDAnalysis: 0.17.0
PyMOL: 2.1.0
VMD: 1.9.4a9

News from around the world – translation

Google Translate has been around for many years now, but progress in language translation, although steady, is slow. Are we ever going to have a tool that perfectly translates information? I doubt it. That is because, often, the context is lacking, and the language itself is evolving. The context here does not necessarily refer to what the article is about, but the wider context, the world situation, the state of things. Human translators might play an important role in proofreading even in the far future, until real artificial intelligence is craeted.

Information flow is increasingly important. News providers race with each other to deliver regularly the best (read “most-wanted-to-be-read”) content. Then there are services that go one step further – aggregators that group news together and create entire communities around that. But how about taking it one step further? We could create an international news platform which would have news in all languages, along with the translation in your own language. This already is happening, indirectly – news providers read news from other newspapers and in other languages and based on that write new content. But we could do better.

Imagine a platform that allows bilingual readers to read news and proofread the translation at the same time. Let’s assume that there is some kind of community credit and acknowledgement which they are happy to receive. Once they finish the article they put a stamp on it – the translation conveys the original message. Then, news readers across the world  get a variety of news from all over the world, with community driven translations that have been publicly verified.

However, translating an article is not just about the content. There is also the context and other relevant background information that the user might need in order to understand the message. For this, surprisingly, there is no any interactive news reading service where this background information can be easily queried. If, somehow, the community could attach pieces of information in some sensible structured way – reading news would become easier for fresh reader. This might be happening organically – through the comments, where discussions and all kind of exchanges can take place.

Should Gromacs adopt MDAnalysis?

The Gromacs team is limited in size and resources despite working on one of the most ubiquitous molecular dynamics packages out there.

Their tools are not perfect. An email server is used for asking questions. This takes away from the community and structure that a typical forum can assume. The messages later are archived and can be searched through – but cannot be clamped together like stackoverflow does. The website is equally dead – outdated comments and documentation is available on every corner. The code is available but it does not use any publicly available services such as github which could significantly simplify the development cycle, discussion, and issue creation. Yet most importantly, moving to services like github gives Gromacs access to a huge pool of competent programmers.

With all this code to be maintained and tested and optimised further for the latest hardware, why would gromacs want to handle analysis when packages such as MDAnalysis keep showing up? Gromacs tools are poorly described and tricky to use. They use inconsistent rules making them even harder to work out. I myself reported multiple bugs, with one being a small bug where you are told that the analysis step is done in ps, but in fact it is dependant on your own choice of unit. Tutorials from others are often your only way forward.

Just today I was trying to analyse my simulation and due to the bug reported above I calculated RMSD every 100 nanoseconds giving me just 5 frames. Yet, for some reason, the analysis lasted as long as computing it for all frames. This most likely means that Gromacs reads in every frame sequentially regardless of how many frames it jumps over. I imagine that there is an advantage in that all frames are “checked” during the analysis, but it is not reasonable to do this every time in every type of analysis.

MDAnalysis does not have this issue. It creates some kind of index file and is capable of browsing through the trajectory very quickly. Despite the package not being fully ready and having many small issues, it is a very promising and growing package.

Gromacs should adapt to the new age and embrace new tools and large open source community, rather than over ambitiously try to achieve everything itself. Instead, Gromacs could slowly transition its set of tools to MDAnalysis, experience which over time would help them to move the community to github.