tokio.cli.index_darshanlogs module

Creates an SQLite database that summarizes the key metrics from a collection of Darshan logs. The database is constructed in a way that facilitates the determination of how much I/O is performed to different file systems.

Schemata:

CREATE TABLE summaries (
    log_id INTEGER,
    fs_id INTEGER,

    bytes_read INTEGER,
    bytes_written INTEGER,
    reads INTEGER,
    writes INTEGER,
    file_not_aligned INTEGER,
    consec_reads INTEGER
    consec_writes INTEGER,

    mmaps INTEGER,
    opens INTEGER,
    seeks INTEGER,
    stats INTEGER,
    fdsyncs INTEGER,
    fsyncs INTEGER,

    seq_reads INTEGER,
    seq_writes INTEGER,
    rw_switches INTEGER,

    f_close_start_timestamp REAL,
    f_close_end_timestamp REAL,
    f_open_start_timestamp REAL,
    f_open_end_timestamp REAL,

    f_read_start_timestamp REAL,
    f_read_end_timestamp REAL,

    f_write_start_timestamp REAL,
    f_write_end_timestamp REAL,

    FOREIGN KEY (fs_id) REFERENCES mounts (fs_id),
    FOREIGN KEY (log_id) REFERENCES headers (log_id),
    UNIQUE(log_id, fs_id)
);

CREATE TABLE mounts (
    fs_id INTEGER PRIMARY KEY,
    mountpt CHAR,
    fsname CHAR
);

CREATE TABLE headers (
    log_id INTEGER PRIMARY KEY,
    filename CHAR UNIQUE,
    end_time INTEGER,
    exe CHAR,
    exename CHAR,
    jobid CHAR,
    nprocs INTEGER,
    start_time INTEGER,
    uid INTEGER,
    username CHAR,
    log_version CHAR,
    walltime INTEGER
);
tokio.cli.index_darshanlogs.create_headers_table(conn)[source]

Creates the headers table

tokio.cli.index_darshanlogs.create_mount_table(conn)[source]

Creates the mount table

tokio.cli.index_darshanlogs.create_summaries_table(conn)[source]

Creates the summaries table

tokio.cli.index_darshanlogs.get_existing_logs(conn)[source]

Returns list of log files already indexed in db

Scans the summaries table for existing entries and returns the file names corresponding to those entries. We don’t worry about summary rows that don’t correspond to existing header entries because the schema prevents this. Similarly, each log’s summaries are committed as a single transaction so we can assume that if a log file has _any_ rows represented in the summaries table, it has been fully processed and does not need to be updated.

Parameters:conn (sqlite3.Connection) – Connection to database containing existing logs
Returns:Basenames of Darshan log files represnted in the database
Return type:list of str
tokio.cli.index_darshanlogs.get_file_mount(filename, mount_list)[source]

Return the mount point in which a file is located

Parameters:
  • filename (str) – Fully equalified path to a file or directory
  • mount_list (list of str) – List of mount points
Returns:

The member of mount_list in which filename

lives; first string is the mount point, and the second is the logical file system name. Returns None if filename does not match any mounts

Return type:

tuple of (str, str) or None

tokio.cli.index_darshanlogs.index_darshanlogs(log_list, output_file, threads=1, max_mb=0.0)[source]

Calculate the sum bytes read/written

Given a list of input files, parse each as a Darshan log in parallel to create a list of scalar summary values correspond to each log and insert these into an SQLite database.

Current implementation parses all logs and stores their index values in memory before beginning the database insert process. This can be memory-intensive if processing many millions of logs at once but avoids thread contention on the SQLite database.

Parameters:
  • log_list (list of str) – Paths to Darshan logs to be processed
  • output_file (str) – Path to a SQLite database file to populate
  • threads (int) – Number of subprocesses to spawn for Darshan log parsing
  • max_mb (float) – Skip logs of size larger than this value
Returns:

Reduced data along different reduction dimensions

Return type:

dict

tokio.cli.index_darshanlogs.init_mount_to_fsname()[source]

Initialize regexes to map mount points to file system names

tokio.cli.index_darshanlogs.main(argv=None)[source]

Entry point for the CLI interface

tokio.cli.index_darshanlogs.process_log_list(conn, log_list)[source]

Expand and filter the list of logs to process

Takes log_list as input by user and returns a list of Darshan logs that should be added to the index database. It does the following:

  1. Expands log_list from a single-element list pointing to a directory [of logs] into a list of log files
  2. Returns the subset of Darshan logs which do not already appear in the given database.

Relies on the logic of get_existing_logs() to determine whether a log appears in a database or not. If a database is somehow created where the summaries table is fully populated but the headers table is not, this will still return log files corresponding to the missing headers and potentially result in duplicate summaries entries that have no matching header.

Parameters:
  • conn (sqlite3.Connection) – Database containing log data
  • log_list (list of str) – List of paths to Darshan logs or a single-element list to a directory
Returns:

Subset of log_list that contains only those Darshan logs

that are not already represented in the database referenced by conn.

Return type:

list of str

tokio.cli.index_darshanlogs.summarize_by_fs(darshan_log, max_mb=0.0)[source]

Generates summary scalar values for a Darshan log

Parameters:
  • darshan_log (str) – Path to a Darshan log file
  • max_mb (float) – Skip logs of size larger than this value
Returns:

Contains three keys (summaries, mounts, and headers) whose values

are dicts of key-value pairs corresponding to scalar summary values from the POSIX module which are reduced over all files sharing a common mount point.

Return type:

dict

tokio.cli.index_darshanlogs.update_headers_table(conn, header_data)[source]

Adds new header data to the headers table

tokio.cli.index_darshanlogs.update_mount_table(conn, mount_points)[source]

Adds new mount points to the mount table

tokio.cli.index_darshanlogs.update_summaries_table(conn, summary_data)[source]

Adds new summary counters to the summaries table

tokio.cli.index_darshanlogs.vprint(string, level)[source]

Print a message if verbosity is enabled

Parameters:
  • string (str) – Message to print
  • level (int) – Minimum verbosity level required to print