Manage GPU Memory When Using TensorFlow and PyTorch
Modern machine learning frameworks take advantage of GPUs to accelerate training/evaluation. Typically, the major platforms use NVIDIA CUDA to map deep learning graphs to operations that are then run on the GPU. CUDA requires the program to explicitly manage memory on the GPU and there are multiple strategies to do this. Unfortunately, TensorFlow does not release memory until the end of the program, and while PyTorch can release memory, it is difficult to ensure that it can and does. This can be side-stepped by using process isolation, which is applicable for both frameworks.
Tensorflow
By default, TensorFlow tries to allocate as much memory as it can on the GPU.
The theory is if the memory is allocated in one large block, subsequent creation of variables will be closer in memory and improve performance.
This behavior can be tuned in TensorFlow using the tf.config
API.
We’ll point out a couple of functions here:
List the GPUs currently usable by the python process:
tf.config.list_physical_devices('GPU')
Set the specific devices TensorFlow will use:
tf.config.set_visible_devices(devices, device_type=None) `TensorFlow set_visible_devices documentation <https://www.tensorflow.org/api_docs/python/tf/config/set_visible_devices>`_.
Set the
gpu
object to use memory growth mode.tf.config.experimental.set_memory_growth(gpu, True)
In this mode, TensorFlow will only allocate the memory it needs, and grow it over time. TensorFlow set_memory_growth documentation.
In a python
with
context block, will restrict all tensors to being allocated only on the specified device:tf.device('/device:GPU:2')
Get the memory usage of the device:
tf.config.experimental.get_memory_usage(device)
An object that allows the user to set special requirements on a particular device:
tf.config.LogicalDeviceConfiguration
This can be used to restrict the amount of memory Tensorflow will use. TensorFlow LogicalDeviceConfiguration documentation.
A method to apply a
LogicalDeviceConfiguration
to adevice
:tf.config.set_logical_device_configufration(device, logical_devices)
Control GPU Utilization with TensorFlow
Restrict Which GPU Tensorflow Can Use
If TensorFlow can use multiple GPUs, you can restrict which one it uses in the following way:
# Get a list of GPU devices
gpus = tf.config.list_physical_devices('GPU')
# Restrict Tensorflow to only use the first.
tf.config.set_visible_devices(gpus[:1], device_type='GPU')
Restrict How Much Memory TensorFlow Can Allocate on a GPU
You can create a logical device with the maximum amount of memory you want TensorFlow to allocate using the following:
# First, Get a list of GPU devices
gpus = tf.config.list_physical_devices('GPU')
# Restrict to only the first GPU.
tf.config.set_visible_devices(gpus[:1], device_type='GPU')
# Create a LogicalDevice with the appropriate memory limit
log_dev_conf = tf.config.LogicalDeviceConfiguration(
memory_limit=2*1024 # 2 GB
)
# Apply the logical device configuration to the first GPU
tf.config.set_logical_device_configuration(
gpus[0],
[log_dev_conf])
PyTorch
Currently, PyTorch has no mechanism to limit direct memory consumption, however PyTorch does have some mechanisms for monitoring memory consumption and clearing the GPU memory cache. If you are careful in deleting all python variables referencing CUDA memory, PyTorch will eventually garbage collect the memory. We review these methods here.
Specify the amount of CUDA memory currently allocated on a given device:
torch.cuda.memory_allocated(device=None)
Release all unoccupied cached memory currently held by the caching allocator:
torch.cuda.empty_cache()
See more at the “How can I release the unused gpu memory?” PyTorch discussion.
Process Isolation
To ensure you can clean up any GPU memory when you’re finished, you can also try process isolation. This requires you to define a pickle-able python method which you can then send to a separate python process with multiprocessing. Upon completion, the other process will terminate and clean up its memory ensuring you don’t leave any unneeded variables behind. This is the strategy employed by DRYML, which provides a function decorator to manage the process creation and retrieval of results. We’ll present a simple example here showing how you might do it.
# The following will not work in an interactive python shell. It must be run as a standalone script.
# Define the function in which we want to allocate memory.
def memory_consumer():
import tensorflow as tf
test_array = tf.random.uniform((5,5))
av = tf.reduce_mean(test_array)
print(av)
if __name__ == "__main__":
import multiprocessing as mp
# Set the multiprocessing start method as 'spawn' since we're interested in only consuming memory for the problem at hand. 'fork' copies all current variables (which may be a lot)
mp.set_start_method('spawn')
# Call that function using a separate process
p = mp.Process(target=memory_consumer)
# Start the process p.start()
# Wait for the process to complete by joining.
p.join()
There are other methods such as multiprocessing pools which handle variable creation and return value management for you.
Learn more about how DRYML does process isolation in the DRYML GitHub repository.