Good Cluster Citizenship
You share TGI RAILS with many other users, and what you do on the system affects others. Exercise good citizenship to ensure that your activity does not adversely impact the system and the research community with whom you share it. Here are some rules of thumb:
Don’t run production jobs on the login nodes
Don’t stress file systems with known-harmful access patterns (many thousands of small files in a single directory)
If you encounter an issue, submit an informative help ticket including loaded modules (module list) and stdout/stderr messages if possible.
Login node usage policy and limits
Login nodes are shared among all users and are intended for file management, job submission, and other tasks that do not require significant computational resources. To keep the login nodes responsive and usable by all, limits on effective CPU-core use and memory by user on a node are enabled through Linux cgroups. Processes that take up significant CPU resources will be terminated automatically. Processes that get killed will need to be run on a compute node. Please see the Running Jobs page for information on submitting jobs to the compute nodes.
Current limits summary:
CPU: equal share of 60 cores divided among users with processes
Memory: 60/64 GB for high/max respectively (allocation slows at “high” threshold)
Processes that have an excessively high cumulative cputime, will be terminated.
Orphaned processes (usually left behind by vscode), will be terminated.
If you have a code running on a login node and it suddenly stops, look for an email explaining that your processes were killed. If you don’t understand why your process was killed, please send in a ticket; and we’ll be happy to discuss the issue with you.
Acceptable Use Policies
As a RAILS user, you agree to follow these acceptable/appropriate use policies: