titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Lustre® Basics

[toc]

Basic Components of a Lustre® System

  • Metadata Server (MDS) – The MDS makes metadata stored in the MDT available to Lustre clients. Each MDS manages the names and directories in the Lustre filesystem and provides network request handling for the MDT.
  • Metadata Target (MDT) – The MDT stores metadata.
  • Object Storage Server (OSS) – The OSS node provides file service, and network request handling for one or more local OSTs.
  • Object Storage Target (OST) -The OST stores file data (chunks of files) as data objects on one or more OSSs. A single file may be stripped across one or more OSTs. When a file is striped across multiple OSTs, chunks of the file will exist on more than one OST.
  • Lustre Clients – The nodes which mount the lustre file system are considered to be lustre clients. The service and computational nodes on titan are, for example, lustre clients.

Basic_Cluster

Basic Open/Write

The Meta Data Server (MDS) stores information about each file including the number, layout, and location of the file’s stripe(s). A file’s data are stored on one or more Object Storage Targets (OSTs).

To access a file, the Lustre client must obtain a file’s information from the MDS. Once this information is obtained, the client will interact directly with the OSTs on which the file is striped.

Warning: Interaction with the Meta Data Server (MDS) is expensive. Limiting tasks that require MDS access (e.g. directory operations, creating/opening/closing files, stat-ing files) will help improve file system interaction performance.

It is important to note that Spider 2, OLCF’s Lustre parallel file system, is a shared resource and as such, the I/O performance observed can vary depending on the particular user jobs running on the system. Information about OLCF I/O best practices can be found here.

More detailed information on Lustre can be found on the Lustre wiki.

Striping Basics

A file may exist on one OST or multiple OSTs. If chunks of a file exist on multiple OSTs, the file is striped across the OSTs.

Advantages of Striping a File Across Multiple OSTs

  • File Size – By placing chunks of a file on multiple OSTs the space required by the file can also be spread over the OSTs. Therefore, a file’s size is not limited to the space available on a single OST.
  • Bandwidth – By placing chunks of a file on multiple OSTs the I/O bandwidth can also be spread over the OSTs. In this manner, a file’s I/O bandwidth is not limited to a single OST.

Disadvantages of Striping a File Across Multiple OSTs

  • Increased Overhead – By placing chunks of a file across multiple OSTs, the overhead needed to manage the file separation, network connections, and multiple OSTs increases.
  • Increased Risk – By placing chunks of a file across multiple OSTs, the odds that an event will take one of the file’s OSTs down or impact data transfer increases.

File/Directory Stripe Patterns

When a file or directory is created it will inherit the parent directory’s stripe settings. However, users have the ability to alter a directory’s stripe pattern and set a new file’s stripe pattern.

Users have the ability to alter the following stripe settings:

Setting Default Description
stripe count 4 Number of OSTs to stripe over
stripe size 1 MB File chunk size in bytes
stripe index -1 (allow system to select) OST on which first stripe will be placed
Warning: Stripe counts over 512 have a negative impact on system and file performance. Please do not create files with stripe counts over 512.

File Chunk Creation

When a file’s size is greater than the set stripe size, the file will be broken down into enough chunks of the specified stripe size to contain the file.

File Chunk Placement

When a file contains multiple chunks, and a stripe count greater than 1 is used, the file’s chunks will be placed on OSTs in a round robin fashion.

Basic Striping Examples

The following example shows various Lustre striping patterns over 3 OSTs for 3 different file sizes. In all three cases the default stripe size of 1MB and the default stripe index (i.e. -1) are used. Note that object 6 in File A is not pictured because the corresponding data has not been written, resulting in a sparse file — Lustre does not create unnecessary objects in the underlying file system.

  • File A is > 5 MB and < 7 MB, stripe count = 3
  • File B is < 1 MB
  • File C is > 1 MB and < 2 MB, stripe count = 1

File_Striping

Choosing a Stripe Pattern

A stripe pattern should be set based on a code’s I/O requirements. When choosing a stripe pattern consider the following:

Stripe Count

Over-striping can negatively impact performance by causing small chunks to be written to each OST. This may under utilize the OSTs and the network. In contrast, under-striping might can place too much stress on individual OSTs, which may also cause resource and request contention. The following table provides general guidelines that can be used when choosing a stripe count for your files:

File size Recommended Stripe Count Notes
≤ 1 TB 4 This is the default stripe count
1 TB < 50 TB File size / 100 GB An 18 TB file would use 18 TB / 100 GB = 180 stripes
>50 TB 512 Stripe counts > 512 can have negative impact on performance.
See warning below.

When a large file uses few stripes, its individual chunks can occupy large portions of an OST, leaving insufficient storage space available for other files which can result in I/O errors. Following these guidelines will ensure that chunks on individual OSTs have a reasonable size.

Warning: Stripe counts over 512 have a negative impact on system and file performance. Please do not create files with stripe counts over 512. If you are working with files of 50 TB or more, please contact help@olcf.ornl.gov for more guidelines specific to your use case.

Stripe Size

The default stripe size is 1 MB; i.e., Lustre sends data in 1 MB chunks. It is not recommended to set a stripe size less than 1 MB or greater than 4 MB.

Stripe Index

The default stripe index allows the system to select the OST on which the first data chunk will be placed. The default stripe index should always be used allowing the system to choose the OST(s). Forcing the use of specific OST(s) by setting the stripe index prevents the system from managing the OST load and can unnecessarily cause high OST load. It can also cause a new file to be striped across an OST that is down or write only; this would cause a file creation to unnecessarily fail.

Warning: The default stripe index, i.e. (-1), should always be used. Forcing the use of specific OST(s) by setting the stripe index hinders the system’s OST management.

Viewing the Striping Information

The lfs getstripe command can be used to view the attributes of a file or directory. The following example shows that file1 has a stripe of (6) on OSTs 19, 59, 70, 54, 39, and 28:

  $ lfs getstripe -q dir/file1
    19      28675008      0x1b58bc0      0
    59      28592466      0x1b44952      0
    70      28656421      0x1b54325      0
    54      28652653      0x1b5346d      0
    39      28850966      0x1b83b16      0
    28      28854363      0x1b8485b      0

The following example shows that directory dir1 has a stripe count of (6), the stripe size is set to (0) (i.e., use the default), and the stripe index/offset is set to (-1) (i.e., use the default).

  $ lfs getstripe dir1
    ...
    dir1
    stripe_count: 6 stripe_size: 0 stripe_offset: -1
    ...

More details can be found in the lfs man page (man lfs).

Altering the Striping Pattern

A user can change the attributes for an existing directory or set the attributes when creating a new file in Lustre by using the lfs setstripe command. An existing file’s stripe may not be altered.

Warning: The default stripe index, i.e, (-1), should always be used. Forcing the use of specific OST(s) by setting the stripe index hinders the system’s OST management.
Note: Files and directories inherit attributes from the parent directory. An existing file’s stripe may not be altered.

Creating New Files

The following will create a zero-length file named file1 with a stripe count of (16):

  $ lfs setstripe -c 16 file1

To alter the stripe of an existing file, you can create a new file with the needed attributes using setstripe and copy the existing file to the created file. To alter the stripe of a large number of files, you can create a new directory with the needed attributes and copy the existing files into the newly created directory. In this manner the files should inherit the directory’s attributes.

Alter Existing Directories

The following example will change the stripe of directory dir1 to (2).

  $ lfs setstripe -c 2 dir1

More details can be found in the lfs man page (man lfs).

Viewing OST Storage

The lfs df command can be used to determine the amount of data stored on each Object Storage Target (OST).

The following example shows the size, used, and available space in human readable format for the /lustre/atlas2 filesystem:

  $ lfs df -h /lustre/atlas2
     UUID                     bytes        Used     Available  Use%      Mounted on
   atlas2-OST0000_UUID        14.0T        3.0T       10.3T    22%   /lustre/atlas2[OST:0]
   atlas2-OST0001_UUID        14.0T        3.0T       10.3T    22%   /lustre/atlas2[OST:1]
   atlas2-OST0002_UUID        14.0T        3.2T       10.1T    24%   /lustre/atlas2[OST:2] 
   atlas2-OST0003_UUID        14.0T        3.0T       10.3T    23%   /lustre/atlas2[OST:3]
   atlas2-OST0004_UUID        14.0T        3.2T       10.0T    24%   /lustre/atlas2[OST:4]
   ...
   atlas2-OST03ee_UUID        14.0T        2.0T       11.2T    15%   /lustre/atlas2[OST:1006]
   atlas2-OST03ef_UUID        14.0T        2.0T       11.2T    15%   /lustre/atlas2[OST:1007]

   filesystem summary:        13.8P        3.0P       10.1P    23%   /lustre/atlas2
Note: A no space left on device error will be returned during file I/O if one of the file’s associated OSTs becomes 100% utilized. An OST may become 100% utilized even if there is space available on the filesystem.

You can see a file or directory’s associated OST(s) with lfs getstripe. lfs df can then be used to see the usage on each OST.

More details can be found in the lfs man page (man lfs).