tokio.connectors.hpss module

Connect to various outputs made available by HPSS

class tokio.connectors.hpss.FtpLog(*args, **kwargs)[source]

Bases: tokio.connectors.common.SubprocessOutputDict

Provides an interface for log files containing HPSS FTP transactions

This connector parses FTP logs generated by HPSS 7.3. Older versions are not supported.

HPSS FTP log files contain transfer records that look something like:

#0   1   2  3        4    5                   6                                  7          8                 9 10 11       12       13  14       15 16

Mon Dec 31 00:06:46 2018 dtn01-int.nersc.gov /home/o/operator/.check_ftp.25651  b          POPN_Cmd          r r  ftp      operator fd  0
Mon Dec 31 00:06:46 2018 0.010               dtn01-int.nersc.gov                33         /home/o/opera...  b o  PRTR_Cmd r        ftp operator fd 0
Mon Dec 31 00:06:48 2018 0.430               sgn-pub-01.nersc.gov               0          /home/g/glock...  b o  RETR_Cmd r        ftp wwwhpss
Mon Feb  4 16:45:04 2019 457.800             sgn-pub-01.nersc.gov               7184842752 /home/g/glock...  b o  RETR_Cmd r        ftp wwwhpss
Fri Jul 12 15:32:43 2019 2.080               gert01-224.nersc.gov               2147483647 /home/n/nickb...  b i  PSTO_Cmd r        ftp nickb    fd 0
Mon Jul 29 15:44:22 2019 0.800               dtn02.nersc.gov                    464566784  /home/n/nickb...  b o  PRTR_Cmd r        ftp nickb    fd 0

which this class deserializes and represents as a dictionary-like object of the form:

{
    "ftp": [
        {
            "bytes": 0,
            "bytes_sec": 0.0,
            "duration_sec": 0.43,
            "end_timestamp": 1546243608.0,
            "hpss_path": "/home/g/glock...",
            "hpss_uid": "wwwhpss",
            "opname": "HL",
            "remote_host": "sgn-pub-01.nersc.gov",
            "start_timestamp": 1546243607.57
        },
        ...
    ],
    "pftp": [
        {
            "bytes": 33,
            "bytes_sec": 3300.0,
            "duration_sec": 0.01,
            "end_timestamp": 1546243606.0,
            "hpss_path": "/home/o/opera...",
            "hpss_uid": "operator",
            "opname": "HL",
            "remote_host": "dtn01-int.nersc.gov",
            "start_timestamp": 1546243605.99
        },
       ...
    ]
}

where the top-level keys are either “ftp” or “pftp”, and their values are lists containing every FTP or parallel FTP transaction, respectively.

classmethod from_file(cache_file)[source]

Instantiate from a cache file

classmethod from_str(input_str)[source]

Instantiate from a string

load_str(input_str)[source]

Parse text from an HPSS FTP log

class tokio.connectors.hpss.HpssDailyReport(*args, **kwargs)[source]

Bases: tokio.connectors.common.SubprocessOutputDict

Representation for the daily report that HPSS can generate

classmethod from_file(cache_file)[source]

Instantiate from a cache file

classmethod from_str(input_str)[source]

Instantiate from a string

load_str(input_str)[source]

Parse the HPSS daily report text

class tokio.connectors.hpss.HsiLog(*args, **kwargs)[source]

Bases: tokio.connectors.common.SubprocessOutputDict

Provides an interface for log files containing HSI and HTAR transactions

This connector receives input from an HSI log file which takes the form:

Sat Aug 10 00:05:26 2019 dtn01.nersc.gov      hsi  57074 31117 LH     0  0.02          543608 12356.7 4 /global/project/projectdir... /home/g/glock/... 57074
Sat Aug 10 00:05:28 2019 cori02-224.nersc.gov htar 58888 14301 create LH 0 58178668032 397.20 146472.0  /nersc/projects/blah.tar      5                 58888
Sat Aug 10 00:05:29 2019 myuniversity.edu     hsi  35136 1391  LH     -1 0.03          0      0.0       0                             xyz.bin           /home/g/glock/xyz.bin 35136

but uses both tabs and spaces to denote different fields. This connector then presents this data in a dictionary-like form:

{
    "hsi": [
        {
            "access_latency_sec": 0.03,
            "account_id": 35136,
            "bytes": 0,
            "bytes_sec": 0.0,
            "client_pid": 1035,
            "cos_id": 0,
            "dest_path": "/home/g/glock/blah.bin",
            "hpss_uid": 35136,
            "opname": "LH",
            "remote_host": "someuniv.edu",
            "return_code": -1,
            "source_path": "blah.bin",
            "end_timestamp": 1565420701
        },
        ...
    "htar": [
        {
            "account_id": 58888,
            "bytes": 58178668032,
            "bytes_sec": 146472.0,
            "client_pid": 14301,
            "cos_id": 5,
            "duration_sec": 397.2,
            "hpss_path": "/nersc/projects/blah.tar",
            "hpss_uid": 58888,
            "htar_op": "create",
            "opname": "LH",
            "remote_ftp_host": "",
            "remote_host": "cori02-224.nersc.gov",
            "return_code": 0,
            "end_timestamp": 1565420728
        }
    ]
}

where the top-level keys are either “hsi” or “htar”, and their values are lists containing every HSI or HTAR transaction, respectively.

The keys generally follow the raw nomenclature used in the HSI logs which can be found on Mike Gleicher’s website. Perhaps most relevant are the opnames, which can be one of

  • FU - file unlink. Has no destination filename field or account id.
  • FR - file rename. Has no account id.
  • LH - transfer into HPSS (“Local to HPSS”)
  • HL - transfer out of HPSS (“HPSS to Local”)
  • HH - internal file copy (“HPSS-to-HPSS”)

For posterity,

  • access_latency_sec is the time to open the file. This includes the latency to pull the tape and insert it into the drive.
  • bytes and bytes_sec are the size and rate of data transfer
  • duration_sec is the time to complete the transfer
  • return_code is zero on success, nonzero otherwise
classmethod from_file(cache_file)[source]

Instantiate from a cache file

classmethod from_str(input_str)[source]

Instantiate from a string

load_str(input_str)[source]

Parse an HSI log file containing HSI and HTAR transactions

tokio.connectors.hpss._find_columns(line, sep='=', gap=' ', strict=False)[source]

Determine the column start/end positions for a header line separator

Takes a line separator such as the one denoted below:

Host             Users      IO_GB
===============  =====  =========
heart               53   148740.6

and returns a tuple of (start index, end index) values that can be used to slice table rows into column entries.

Parameters:
  • line (str) – Text comprised of separator characters and spaces that define the extents of columns
  • sep (str) – The character used to draw the column lines
  • gap (str) – The character separating sep characters
  • strict (bool) – If true, restrict column extents to only include sep characters and not the spaces that follow them.
Returns:

Return type:

list of tuples

tokio.connectors.hpss._get_ascii_resolution(numeric_str)[source]

Determines the maximum resolution of an ascii-encoded numerical value

Necessary because HPSS logs contain numeric values at different and often-insufficient resolutions. For example, tiny but finite transfers can show up as taking 0.000 seconds, which results in infinitely fast transfers when calculated naively. This function gives us a means to guess at what the real speed might’ve been.

Does not work with scientific notation.

Parameters:numeric_str (str) – An ascii-encoded integer or float
Returns:The smallest number that can be expressed using the resolution provided with numeric_str
Return type:float
tokio.connectors.hpss._hpss_timedelta_to_secs(timedelta_str)[source]

Convert HPSS-encoded timedelta string into seconds

Parameters:timedelta_str (str) – String in form d-HH:MM:SS where d is the number of days, HH is hours, MM minutes, and SS seconds
Returns:number of seconds represented by timedelta_str
Return type:int
tokio.connectors.hpss._parse_section(lines, start_line=0)[source]

Parse a single table of the HPSS daily report

Converts a table from the HPSS daily report into a dictionary. For example an example table may appear as:

Archive : IO Totals by HPSS Client Gateway (UI) Host
Host             Users      IO_GB       Ops
===============  =====  =========  ========
heart               53   148740.6     27991
dtn11                5    29538.6      1694
Total               58   178279.2     29685
HPSS ACCOUNTING:         224962.6

which will return a dict of form:

{
    "system": "archive",
    "title": "io totals by hpss client gateway (ui) host",
    "records": {
        "heart": {
            "io_gb": "148740.6",
            "ops": "27991",
            "users": "53",
        },
        "dtn11": {
            "io_gb": "29538.6",
            "ops": "1694",
            "users": "5",
        },
        "total": {
            "io_gb": "178279.2",
            "ops": "29685",
            "users": "58",
        }
    ]
}

This function is robust to invalid data, and any lines that do not appear to be a valid table will be treated as the end of the table.

Parameters:
  • lines (list of str) – Text of the HPSS report
  • start_line (int) –

    Index of lines defined such that

    • lines[start_line] is the table title
    • lines[start_line + 1] is the table heading row
    • lines[start_line + 2] is the line separating the table heading and the first row of data
    • lines[start_line + 3:] are the rows of the table
Returns:

Tuple of (dict, int) where

  • dict contains the parsed contents of the table
  • int is the index of the last line of the table + 1

Return type:

tuple

tokio.connectors.hpss._rekey_table(table, key)[source]

Converts a list of records into a dict of records

Converts a table of records as returned by _parse_section() of the form:

{
    "records": [
        {
            "host": "heart",
            "io_gb": "148740.6",
            "ops": "27991",
            "users": "53",
        },
        ...
    ]
}

Into a table of key-value pairs the form:

{
    "records": {
        "heart": {
            "io_gb": "148740.6",
            "ops": "27991",
            "users": "53",
        },
        ...
    }
}

Does not handle degenerate keys when re-keying, so only some tables with a uniquely identifying key can be rekeyed.

Parameters:
  • table (dict) – Output of the _parse_section() function
  • key (str) – Key to pull out of each element of table[‘records’] to use as the key for each record
Returns:

Table with records expressed as key-value pairs instead of a list

Return type:

dict