Lustre Configuration

Capacity to Inode Ratio

Ratio: 1TB Quota to 1,500,000 inodes

An inode is a record that describes a file, directory, or link. This information is stored in a dedicated flash pool on Taiga with a finite capacity. To ensure that the inode pool doesn’t run out of space before the capacity pools, this quota ratio is implemented.

For example, if your project has a 10TB quota on Taiga, it has a quota of 15.0 million inodes.

Block Size

File System Block Size: 2MB

For a balance of throughput performance and file space efficiency, a block size of 2MB has been chosen for the Taiga filesystem. Larger file sizes help larger-streaming data movement go faster. In general, large I/O to filesystems is encouraged, when possible.

Default Stripe Size

  • Stripe Size: 1

  • Number of Flash OSTs in Taiga: 32

  • Number of HDD OSTs in Taiga: 38

Lustre is capable of striping data over multiple object storage targets (OSTs) to increase performance and help balance data across the disks. The default stripe for Taiga is set to 1 but this value is overridden as the file being written gets larger; this behavior is determined by the progressive file layout (PFL) configured for Taiga.

Run lfs getstripe to see how many OSTs a file is striped across. The following example shows a file on Delta that has FIDs in 3 components since is 500MB in size

[user@client]$ lfs getstripe /taiga/nsf/delta/bbka/$user/testfiles/file_0
   /taiga/nsf/delta/bbka/$user/testfiles/file_0
     lcm_layout_gen:    6
     lcm_mirror_count:  1
     lcm_entry_count:   4
       lcme_id:             1
       lcme_mirror_id:      0
       lcme_flags:          init
       lcme_extent.e_start: 0
       lcme_extent.e_end:   65536
         lmm_stripe_count:  1
         lmm_stripe_size:   65536
         lmm_pattern:       raid0
         lmm_layout_gen:    0
         lmm_stripe_offset: 11
         lmm_pool:          ddn_nvme
         lmm_objects:
         - 0: { l_ost_idx: 11, l_fid: [0x9c00013a0:0x2926ad:0x0] }

       lcme_id:             2
       lcme_mirror_id:      0
       lcme_flags:          init
       lcme_extent.e_start: 65536
       lcme_extent.e_end:   536870912
         lmm_stripe_count:  1
         lmm_stripe_size:   2097152
         lmm_pattern:       raid0
         lmm_layout_gen:    0
         lmm_stripe_offset: 173
         lmm_pool:          ddn_hdd
         lmm_objects:
         - 0: { l_ost_idx: 173, l_fid: [0xdc00013a1:0x30e07:0x0] }

       lcme_id:             3
       lcme_mirror_id:      0
       lcme_flags:          init
       lcme_extent.e_start: 536870912
       lcme_extent.e_end:   8589934592
         lmm_stripe_count:  4
         lmm_stripe_size:   2097152
         lmm_pattern:       raid0
         lmm_layout_gen:    0
         lmm_stripe_offset: 175
         lmm_pool:          ddn_hdd
         lmm_objects:
         - 0: { l_ost_idx: 175, l_fid: [0xe800013a1:0x30cd3:0x0] }
         - 1: { l_ost_idx: 155, l_fid: [0x4c00013a1:0x28bcf:0x0] }
         - 2: { l_ost_idx: 165, l_fid: [0xa400013a1:0x29058:0x0] }
         - 3: { l_ost_idx: 172, l_fid: [0xe400013a1:0x30d77:0x0] }

       lcme_id:             4
       lcme_mirror_id:      0
       lcme_flags:          0
       lcme_extent.e_start: 4294967296
       lcme_extent.e_end:   EOF
         lmm_stripe_count:  -1
         lmm_stripe_size:   2097152
         lmm_pattern:       raid0
         lmm_layout_gen:    0
         lmm_stripe_offset: -1
         lmm_pool:          ddn_hdd

Progressive File Layout (PFL)

Taiga deploys a PFL that performs these two key functions:

  • Allows us to keep the initial 64KB of every file on NVME flash.

    This increases performance for small file I/O by keeping it on faster media and keeps that noisy traffic off the spinning media that prefer larger I/O patterns. This helps improve the throughput for workloads doing large I/O by letting them have clearer access to the HDDs that make up the bulk of Taiga’s capacity.

  • Allows us to dynamically set the stripe size of files so that the bigger a file grows the more stripes it gets.

    This helps improve the performance of the system and helps keep the OST usage rates more balanced which leads to better overall system responsiveness. The stripe count of a file can be overridden by using either lfs setstripe or by using lfs migrate to change an existing file’s stripe count; however, these actions are very much discouraged. You should use the system defaults except in rare cases.

PFL Implementation Details:

Component 1:
- Size: 0B to 64KB
- Pool: NVME
- Stripe Count: 1

Component 2:
- Size: 64KB to 512MB
- Pool: HDD
- Stripe Count: 1

Component 3:
- Size: 512MB to 8GB
- Pool: HDD
- Stripe Count: 4

Component 4:
- Size: 8GB+
- Pool: HDD
- Stripe Count: 20 (~ 50% HDD OSTs)