Manage GPU Memory When Using TensorFlow and PyTorch

Modern machine learning frameworks take advantage of GPUs to accelerate training/evaluation. Typically, the major platforms use NVIDIA CUDA to map deep learning graphs to operations that are then run on the GPU. CUDA requires the program to explicitly manage memory on the GPU and there are multiple strategies to do this. Unfortunately, TensorFlow does not release memory until the end of the program, and while PyTorch can release memory, it is difficult to ensure that it can and does. This can be side-stepped by using process isolation, which is applicable for both frameworks.

Tensorflow

By default, TensorFlow tries to allocate as much memory as it can on the GPU. The theory is if the memory is allocated in one large block, subsequent creation of variables will be closer in memory and improve performance. This behavior can be tuned in TensorFlow using the tf.config API. We’ll point out a couple of functions here:

Control GPU Utilization with TensorFlow

Restrict Which GPU Tensorflow Can Use

If TensorFlow can use multiple GPUs, you can restrict which one it uses in the following way:

# Get a list of GPU devices
gpus = tf.config.list_physical_devices('GPU')

# Restrict Tensorflow to only use the first.
tf.config.set_visible_devices(gpus[:1], device_type='GPU')

Restrict How Much Memory TensorFlow Can Allocate on a GPU

You can create a logical device with the maximum amount of memory you want TensorFlow to allocate using the following:

# First, Get a list of GPU devices
gpus = tf.config.list_physical_devices('GPU')

# Restrict to only the first GPU.
tf.config.set_visible_devices(gpus[:1], device_type='GPU')

# Create a LogicalDevice with the appropriate memory limit
log_dev_conf = tf.config.LogicalDeviceConfiguration(
    memory_limit=2*1024 # 2 GB
)

# Apply the logical device configuration to the first GPU
tf.config.set_logical_device_configuration(
    gpus[0],
    [log_dev_conf])

PyTorch

Currently, PyTorch has no mechanism to limit direct memory consumption, however PyTorch does have some mechanisms for monitoring memory consumption and clearing the GPU memory cache. If you are careful in deleting all python variables referencing CUDA memory, PyTorch will eventually garbage collect the memory. We review these methods here.

See more at the “How can I release the unused gpu memory?” PyTorch discussion.

Process Isolation

To ensure you can clean up any GPU memory when you’re finished, you can also try process isolation. This requires you to define a pickle-able python method which you can then send to a separate python process with multiprocessing. Upon completion, the other process will terminate and clean up its memory ensuring you don’t leave any unneeded variables behind. This is the strategy employed by DRYML, which provides a function decorator to manage the process creation and retrieval of results. We’ll present a simple example here showing how you might do it.

# The following will not work in an interactive python shell. It must be run as a standalone script.

# Define the function in which we want to allocate memory.
def memory_consumer():
    import tensorflow as tf
    test_array = tf.random.uniform((5,5))
    av = tf.reduce_mean(test_array)
    print(av)

if __name__ == "__main__":
    import multiprocessing as mp
    # Set the multiprocessing start method as 'spawn' since we're interested in only consuming memory for the problem at hand. 'fork' copies all current variables (which may be a lot)
    mp.set_start_method('spawn')
    # Call that function using a separate process
    p = mp.Process(target=memory_consumer)
    # Start the process p.start()
    # Wait for the process to complete by joining.
    p.join()

There are other methods such as multiprocessing pools which handle variable creation and return value management for you.

Learn more about how DRYML does process isolation in the DRYML GitHub repository.