Training Deeper Models by GPU Memory Optimization on TensorFlow

An innovative approach to fighting memory resource limitation

This article is part of the Academic Alibaba series and is taken from the paper entitled “Training Deeper Models by GPU Memory Optimization on TensorFlow” by Chen Meng, Minmin Sun, Jun Yang, Minghui Qiu and Yang Gu, accepted by the 2017 Conference on Neural Information Processing Systems (NIPS). The full paper can be read here.

Image for post
Image for post

When training deep learning models on GPU becomes popular, the problems of model complexity and memory resource limitation are ever more salient. Alibaba’s Machine Learning team came up with effective GPU memory optimization strategies that overcome memory limitations and are seamlessly integrated into TensorFlow.

Recently deep learning plays an increasingly important role in various applications. The essential logic of training deep learning models involves parallel linear algebra calculation which is suitable for GPU. However, due to physical constraints, GPU usually has lesser device memory than host memory. The latest high-end NVIDIA GPU P100 is equipped with 12–16 GB device memory, while a CPU server has 128GB host memory. On the contrary, the trend for deep learning models is to have a “deeper and wider” architecture. For example, ResNet consists of up to 1001 neuron layers and a Neural Machine Translation(NMT) model consists of 8 layers using attention mechanism, and most of layers in NMT model are sequential ones unrolling horizontally which brings non-neglectable memory consumption.

Image for post
Image for post

Read the full paper here.

Alibaba Tech

First hand, detailed, and in-depth information about Alibaba’s latest technology → Search “Alibaba Tech” on Facebook

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store