Manage GPU Memory When Using TensorFlow and PyTorch

Modern machine learning frameworks take advantage of GPUs to accelerate training/evaluation. Typically, the major platforms use NVIDIA CUDA to map deep learning graphs to operations that are then run on the GPU. CUDA requires the program to explicitly manage memory on the GPU and there are multiple strategies to do this. Unfortunately, TensorFlow does not release memory until the end of the program, and while PyTorch can release memory, it is difficult to ensure that it can and does. This can be side-stepped by using process isolation, which is applicable for both frameworks.

Tensorflow

By default, TensorFlow tries to allocate as much memory as it can on the GPU. The theory is if the memory is allocated in one large block, subsequent creation of variables will be closer in memory and improve performance. This behavior can be tuned in TensorFlow using the tf.config API. We’ll point out a couple of functions here:

List the GPUs currently usable by the python process:
```
tf.config.list_physical_devices('GPU')
```
TensorFlow list_physical_devices documentation.

Set the specific devices TensorFlow will use:

  tf.config.set_visible_devices(devices, device_type=None)

`TensorFlow set_visible_devices documentation <https://www.tensorflow.org/api_docs/python/tf/config/set_visible_devices>`_.

Set the gpu object to use memory growth mode.
```
tf.config.experimental.set_memory_growth(gpu, True)
```
In this mode, TensorFlow will only allocate the memory it needs, and grow it over time. TensorFlow set_memory_growth documentation.
In a python with context block, will restrict all tensors to being allocated only on the specified device:
```
tf.device('/device:GPU:2')
```
TensorFlow device documentation.
Get the memory usage of the device:
```
tf.config.experimental.get_memory_usage(device)
```
TensorFlow get_memory_usage documentation.
An object that allows the user to set special requirements on a particular device:
```
tf.config.LogicalDeviceConfiguration
```
This can be used to restrict the amount of memory Tensorflow will use. TensorFlow LogicalDeviceConfiguration documentation.
A method to apply a LogicalDeviceConfiguration to a device:
```
tf.config.set_logical_device_configufration(device, logical_devices)
```
TensorFlow set_logical_device_configuration documentation.

Control GPU Utilization with TensorFlow

Restrict Which GPU Tensorflow Can Use

If TensorFlow can use multiple GPUs, you can restrict which one it uses in the following way:

# Get a list of GPU devices
gpus = tf.config.list_physical_devices('GPU')

# Restrict Tensorflow to only use the first.
tf.config.set_visible_devices(gpus[:1], device_type='GPU')

Restrict How Much Memory TensorFlow Can Allocate on a GPU

You can create a logical device with the maximum amount of memory you want TensorFlow to allocate using the following:

# First, Get a list of GPU devices
gpus = tf.config.list_physical_devices('GPU')

# Restrict to only the first GPU.
tf.config.set_visible_devices(gpus[:1], device_type='GPU')

# Create a LogicalDevice with the appropriate memory limit
log_dev_conf = tf.config.LogicalDeviceConfiguration(
    memory_limit=2*1024 # 2 GB
)

# Apply the logical device configuration to the first GPU
tf.config.set_logical_device_configuration(
    gpus[0],
    [log_dev_conf])

PyTorch

Currently, PyTorch has no mechanism to limit direct memory consumption, however PyTorch does have some mechanisms for monitoring memory consumption and clearing the GPU memory cache. If you are careful in deleting all python variables referencing CUDA memory, PyTorch will eventually garbage collect the memory. We review these methods here.

Specify the amount of CUDA memory currently allocated on a given device:
```
torch.cuda.memory_allocated(device=None)
```
PyTorch memory_allocations documentation.
Release all unoccupied cached memory currently held by the caching allocator:
```
torch.cuda.empty_cache()
```
PyTorch empty_cache documentation.

See more at the “How can I release the unused gpu memory?” PyTorch discussion.

Process Isolation

To ensure you can clean up any GPU memory when you’re finished, you can also try process isolation. This requires you to define a pickle-able python method which you can then send to a separate python process with multiprocessing. Upon completion, the other process will terminate and clean up its memory ensuring you don’t leave any unneeded variables behind. This is the strategy employed by DRYML, which provides a function decorator to manage the process creation and retrieval of results. We’ll present a simple example here showing how you might do it.

# The following will not work in an interactive python shell. It must be run as a standalone script.

# Define the function in which we want to allocate memory.
def memory_consumer():
    import tensorflow as tf
    test_array = tf.random.uniform((5,5))
    av = tf.reduce_mean(test_array)
    print(av)

if __name__ == "__main__":
    import multiprocessing as mp
    # Set the multiprocessing start method as 'spawn' since we're interested in only consuming memory for the problem at hand. 'fork' copies all current variables (which may be a lot)
    mp.set_start_method('spawn')
    # Call that function using a separate process
    p = mp.Process(target=memory_consumer)
    # Start the process p.start()
    # Wait for the process to complete by joining.
    p.join()

There are other methods such as multiprocessing pools which handle variable creation and return value management for you.

Learn more about how DRYML does process isolation in the DRYML GitHub repository.