Training Deeper Models by GPU Memory Optimization on TensorFlow
An innovative approach to fighting memory resource limitation
This article is part of the Academic Alibaba series and is taken from the paper entitled “Training Deeper Models by GPU Memory Optimization on TensorFlow” by Chen Meng, Minmin Sun, Jun Yang, Minghui Qiu and Yang Gu, accepted by the 2017 Conference on Neural Information Processing Systems (NIPS). The full paper can be read here.
When training deep learning models on GPU becomes popular, the problems of model complexity and memory resource limitation are ever more salient. Alibaba’s Machine Learning team came up with effective GPU memory optimization strategies that overcome memory limitations and are seamlessly integrated into TensorFlow.
Recently deep learning plays an increasingly important role in various applications. The essential logic of training deep learning models involves parallel linear algebra calculation which is suitable for GPU. However, due to physical constraints, GPU usually has lesser device memory than host memory. The latest high-end NVIDIA GPU P100 is equipped with 12–16 GB device memory, while a CPU server has 128GB host memory. On the contrary, the trend for deep learning models is to have a “deeper and wider” architecture. For example, ResNet consists of up to 1001 neuron layers and a Neural Machine Translation(NMT) model consists of 8 layers using attention mechanism, and most of layers in NMT model are sequential ones unrolling horizontally which brings non-neglectable memory consumption.
Read the full paper here.
Alibaba Tech
First hand, detailed, and in-depth information about Alibaba’s latest technology → Search “Alibaba Tech” on Facebook