TL; DR Having a fast GPU is a very important aspect when one begins to learn deep learning as this allows for rapid gain in practical experience which is key to building the expertise with which you will be able to apply deep learning to new problems.
Without this rapid feedback it just takes too much time to learn from one’s mistakes and it can be discouraging and frustrating to go on with deep learning.
You will be able to detect patterns which give you hints to what parameter or layer needs to be added, removed, or adjusted.
So overall, one can say that one GPU should be sufficient for almost any task but that multiple GPUs are becoming more and more important to accelerate your deep learning models.
Later I ventured further down the road and I developed a new 8-bit compression technique which enables you to parallelize dense or fully connected layers much more efficiently with model parallelism compared to 32-bit methods.
However, I also found that parallelization can be horribly frustrating.
There are other libraries which support parallelism, but these are either slow (like Tensor Flow with 2x-3x) or difficult to use for multiple GPUs (Theano) or both.
If you put value on parallelism I recommend using either Pytorch or CNTK.
The only deep learning library which currently implements efficient algorithms across GPUs and across computers is CNTK which uses Microsoft’s special parallelization algorithms of 1-bit quantization (efficient) and block momentum (very efficient).
With CNTK and a cluster of 96 GPUs you can expect a new linear speed of about 90x-95x.
Pytorch might be the next library which supports efficient parallelism across machines, but the library is not there yet.
So how do you select the GPU which is right for you?
This blog post will delve into that question and will lend you advice which will help you to make choice that is right for you.