Call Sales: +44(0)175 347 1040

Nvidia Powered
Deep Learning Hardware

Deep Learning uses multi-layered deep neural networks to simulate the functions of the brain to solve tasks that have previously eluded scientists. As neural networks have multiple layers they are best run on highly parallel processors. For this reason, you’ll be able to train your network much faster on GPUs than CPUs as the latter are more suited to serial tasks.

Performance GPU vs CPU

DGX-1

170 TFLOPS

CPU

3 TFLOPS
Performance in teraFLOPS

The graph above shows the dramatic performance difference in teraFLOPs of the DGX-1 Deep Learning server versus a traditional server with two Intel Xeon E5 2697 v3 CPUs. Of course you can train Deep Learning Networks without a GPU however the task is so computationally expensive that it’s almost exclusively done using GPUs.

Training Time GPU vs CPU

GPU Server with x4 Tesla M40

0.4 Days

Dual CPU

5.9 Days
Training Time in Days

The graph above shows the real world benefits in time saved when Deep Learning on a server with four Tesla M40 cards versus two Intel Xeon E5 2699 v3 CPUs.

Which GPU(s) should I use in my Deep Learning Server?

Raw processing power isn’t the only consideration

Most Deep Learning frameworks make use of a specific library called cuDNN (CUDA Deep Neural Networks) which is specific to NVIDIA GPUs. So how do you decide which GPUs to get?

The cheaper cards offer a large amount of raw processing power but the more expensive cards will be optimised to provide increased memory on the cards, which allow for greater layers of parallel processing. Similarly, the more expensive cards lend themselves to being clustered in high-end infiniband networks.

Contact Us

Deep Learning GPUs Compared

The table below highlights the key features and performance characteristics of the most popular GPUs for Deep Learning.

  • Product
  • Architecture
  • CUDA Cores
  • Memory
  • Memory Badwidth
  • Memory Type
  • ECC Support
  • Interconnect Bandwidth
  • Double-Precision (FP64) Performance
  • Single-Precision (FP32) Performance
  • Half-Precision (FP16) Performance

TITAN X (2016)

  • Pascal
  • 3584
  • 12GB
  • 480GB/sec
  • GDDR5X
  • 32GB/sec
  • 0.34 teraFLOPs
  • 11 teraFLOPs
  • 0.17 teraFLOPs

Tesla K40

  • Kepler
  • 2880
  • 12GB
  • 288GB/sec
  • GDDR5
  • 32GB/sec
  • 1.43 teraFLOPs
  • 4.29 teraFLOPs
  • N/A

Tesla K80

  • Kepler
  • 2496 per GPU
  • 12GB per GPU
  • 240GB/sec per GPU
  • GDDR5
  • 32GB/sec
  • 2.91 teraFLOPs
  • 8.74 teraFLOPs
  • N/A

Tesla M40

  • Maxwell
  • 3072
  • 24GB
  • 288GB/sec
  • GDDR5
  • 32GB/sec
  • 0.21 teraFLOPs
  • 7.00 teraFLOPs
  • N/A

Tesla P100 (PCI-E)

  • Pascal
  • 3584
  • 12GB or 16GB
  • 540 or 720GB/sec
  • HBM2
  • 32GB/sec
  • 4.70 teraFLOPs
  • 9.30 teraFLOPs
  • 18.70 teraFLOPs

Tesla P100 (NVLink)

  • Pascal
  • 3584
  • 16GB
  • 720GB/sec
  • HBM2
  • 160GB/sec
  • 5.30 teraFLOPs
  • 10.60 teraFLOPs
  • 21.20 teraFLOPs
 
Contact us for further information