I was just reading about a benchmark of Google’s TPUv2 with the recently released Nvidia’s V100 in the field of machine learning. The authors used the the Nvidia’s GPU from the Amazon cloud! This is an interesting concept as the computation is increasingly done in “the cloud” or in specially designed HPC clusters. How expensive is it? $8.4 per hour for 4 GPUs V100, which if used for one month without a break would be around six thousands dollars.
This cost seems prohibitive at first but the cost of setting everything up, the initial investment, possible access to cheaper GPUs could mean that it pays off well, not mentioning that it is Amazon’s business to update the GPUs regularly. The direct cost could also have a small advantage – it would force researchers to think more deeply which simulations they should run and which should not – dilemmas which currently are mostly ignored.
Another important factor to consider is the very fast connection, such as infiniband, for the cluster nodes. My own tests have shown that a single GPU can replace around 4 best CPU nodes. The 4 nodes by themselves are terribly expensive, but the networking costs I believe are equally huge. And one could go further and use two GPUs on a single motherboard. All the money and resources thrown at the networking and space and connection might simply be not necessary. Having access to 2 GPUs such as V100 or a lower/cheaper GPU could be the right way to go.
And of course, for 6 thousand dollars, most people would prefer to buy their own GPUs – with the gaming series GTX being the best moneywise investment. And to estimate how useful such investment would be for your own system, it might be worth spending a few dollars and testing the system on Amazon GPUs.