tokio.connectors.nersc_lfsstate module¶
Tools to parse and index the outputs of Lustre’s lfs
and lctl
commands
to quantify Lustre fullness and health. Assumes inputs are generated by NERSC’s
Lustre health monitoring cron jobs which periodically issue the following:
echo "BEGIN $(date +%s)" >> osts.txt
/usr/bin/lfs df >> osts.txt
echo "BEGIN $(date +%s)" >> ost-map.txt
/usr/sbin/lctl dl -t >> ost-map.txt
Accepts ASCII text files, or gzip-compressed text files.
-
class
tokio.connectors.nersc_lfsstate.
NerscLfsOstFullness
(cache_file=None)[source]¶ Bases:
dict
Subclass of dictionary that self-populates with Lustre OST fullness.
-
__init__
(cache_file=None)[source]¶ Load the fullness of OSTs
Parameters: cache_file (str, optional) – Path to a cache file to load instead of issuing the lfs df
command
-
__repr__
()[source]¶ Serialize OST fullness into a format that resembles
lfs df
.Returns: Serialization of the OST fullness in a format similar to lfs df
. Columns are- Name of OST (e.g., snx11025-OST0001_UUID)
- Total kibibytes on OST
- Used kibibytes on OST
- Available kibibytes on OST
- Percent capacity used
- Mount point, role, and OST ID
Return type: str
-
_save_cache
(output)[source]¶ Serialize object into a form resembling the output of
lfs df
.Parameters: output (file) – File-like object into which resulting text should be written.
-
-
class
tokio.connectors.nersc_lfsstate.
NerscLfsOstMap
(cache_file=None)[source]¶ Bases:
dict
Subclass of dictionary that self-populates with Lustre OST-OSS mapping.
-
__init__
(cache_file=None)[source]¶ Load the mapping of OSTs to OSSes.
Parameters: cache_file (str, optional) – Path to a cache file to load instead of issuing the lctl dl -t
command
-
__repr__
()[source]¶ Serialize OST map into a format that resembles
lctl dl -t
.Returns: Serialization of the OST to OSS mapping in a format similar to lctl dl -t
. Fixed-width columns are- index: OST/MDT index
- status: up/down status
- role:
osc
,mdc
, etc - role_id: name with unique identifier for target
- uuid: UUID of target
- ref_count: number of references to target
- nid: LNET identifier of the target
Return type: str
-
_save_cache
(output)[source]¶ Serialize object into a form resembling the output of
lctl dl -t
.Parameters: output (file) – File-like object into which resulting text should be written.
-
get_failovers
()[source]¶ Determine OSSes which are likely affected by a failover.
Figure out the OSTs that are probably failed over and, for each time stamp and file system, return a list of abnormal OSSes and the expected number of OSTs per OSS.
Returns: Dictionary keyed by timestamps and whose values are dicts of the form: { 'mode': int, 'abnormal_ips': [list of str] }
where
mode
refers to the statistical mode of OSTs per OSS, andabnormal_ips
is a list of strings containing the IP addresses of OSSes whose OST counts are not equal to themode
for that time stamp.Return type: dict
-
-
tokio.connectors.nersc_lfsstate.
_REX_LFS_DF
= <_sre.SRE_Pattern object>¶ Regular expression to extract OST fullness levels
Matches output of
lfs df
which takes the form:snx11035-OST0000_UUID 90767651352 54512631228 35277748388 61% /scratch2[OST:0]
where the columns are
- OST/MDT UID
- kibibytes total
- kibibytes in use
- kibibytes available
- percent fullness
- file system mount, role, and ID
Carries the implicit assumption that all OSTs are prefixed with snx.
-
tokio.connectors.nersc_lfsstate.
_REX_OST_MAP
= <_sre.SRE_Pattern object>¶ Regular expression to match OSC/MDC lines
Matches output of
lctl dl -t
which takes the form:351 UP osc snx11025-OST0007-osc-ffff8875ac1e7c00 3f30f170-90e6-b332-b141-a6d4a94a1829 5 10.100.100.12@o2ib1
Intentionally skips MGC, LOV, and LMV lines.