tokio.connectors.nersc_lfsstate module

Tools to parse and index the outputs of Lustre’s lfs and lctl commands to quantify Lustre fullness and health. Assumes inputs are generated by NERSC’s Lustre health monitoring cron jobs which periodically issue the following:

echo "BEGIN $(date +%s)" >> osts.txt
/usr/bin/lfs df >> osts.txt

echo "BEGIN $(date +%s)" >> ost-map.txt
/usr/sbin/lctl dl -t >> ost-map.txt

Accepts ASCII text files, or gzip-compressed text files.

class tokio.connectors.nersc_lfsstate.NerscLfsOstFullness(cache_file=None)[source]

Bases: dict

Subclass of dictionary that self-populates with Lustre OST fullness.

__init__(cache_file=None)[source]

Load the fullness of OSTs

Parameters:cache_file (str, optional) – Path to a cache file to load instead of issuing the lfs df command
__repr__()[source]

Serialize OST fullness into a format that resembles lfs df.

Returns:Serialization of the OST fullness in a format similar to lfs df. Columns are
  • Name of OST (e.g., snx11025-OST0001_UUID)
  • Total kibibytes on OST
  • Used kibibytes on OST
  • Available kibibytes on OST
  • Percent capacity used
  • Mount point, role, and OST ID
Return type:str
_save_cache(output)[source]

Serialize object into a form resembling the output of lfs df.

Parameters:output (file) – File-like object into which resulting text should be written.
load_ost_fullness_file()[source]

Parse the cached output of OST fullness generated by lfs df.

Parses the output of a file containing concatenated outputs of lfs df separated by lines of the form BEGIN 0000 where 0000 is the UNIX epoch time.

save_cache(output_file=None)[source]

Serialize object into a form resembling the output of lfs df.

Parameters:output_file (str) – Path to a file to which the serialized output should be written. If None, print to stdout.
class tokio.connectors.nersc_lfsstate.NerscLfsOstMap(cache_file=None)[source]

Bases: dict

Subclass of dictionary that self-populates with Lustre OST-OSS mapping.

__init__(cache_file=None)[source]

Load the mapping of OSTs to OSSes.

Parameters:cache_file (str, optional) – Path to a cache file to load instead of issuing the lctl dl -t command
__repr__()[source]

Serialize OST map into a format that resembles lctl dl -t.

Returns:Serialization of the OST to OSS mapping in a format similar to lctl dl -t. Fixed-width columns are
  • index: OST/MDT index
  • status: up/down status
  • role: osc, mdc, etc
  • role_id: name with unique identifier for target
  • uuid: UUID of target
  • ref_count: number of references to target
  • nid: LNET identifier of the target
Return type:str
_save_cache(output)[source]

Serialize object into a form resembling the output of lctl dl -t.

Parameters:output (file) – File-like object into which resulting text should be written.
get_failovers()[source]

Determine OSSes which are likely affected by a failover.

Figure out the OSTs that are probably failed over and, for each time stamp and file system, return a list of abnormal OSSes and the expected number of OSTs per OSS.

Returns:Dictionary keyed by timestamps and whose values are dicts of the form:
{
    'mode': int,
    'abnormal_ips': [list of str]
}

where mode refers to the statistical mode of OSTs per OSS, and abnormal_ips is a list of strings containing the IP addresses of OSSes whose OST counts are not equal to the mode for that time stamp.

Return type:dict
load_ost_map_file()[source]

Parse the cached output of an OST map generated by lctl dl -t.

Reads the input OST map as given by the cache_file attribute and populates self with keys of the form:

{ timestamp(int) : { file_system: { ost_name : { keys: values } } } }
save_cache(output_file=None)[source]

Serialize object into a form resembling the output of lctl dl -t.

Parameters:output_file (str) – Path to a file to which the serialized output should be written. If None, print to stdout.
tokio.connectors.nersc_lfsstate._REX_LFS_DF = <_sre.SRE_Pattern object>

Regular expression to extract OST fullness levels

Matches output of lfs df which takes the form:

snx11035-OST0000_UUID 90767651352 54512631228 35277748388  61% /scratch2[OST:0]

where the columns are

  • OST/MDT UID
  • kibibytes total
  • kibibytes in use
  • kibibytes available
  • percent fullness
  • file system mount, role, and ID

Carries the implicit assumption that all OSTs are prefixed with snx.

tokio.connectors.nersc_lfsstate._REX_OST_MAP = <_sre.SRE_Pattern object>

Regular expression to match OSC/MDC lines

Matches output of lctl dl -t which takes the form:

351 UP osc snx11025-OST0007-osc-ffff8875ac1e7c00 3f30f170-90e6-b332-b141-a6d4a94a1829 5 10.100.100.12@o2ib1

Intentionally skips MGC, LOV, and LMV lines.