tokio.cli.index_darshanlogs module¶
Creates an SQLite database that summarizes the key metrics from a collection of Darshan logs. The database is constructed in a way that facilitates the determination of how much I/O is performed to different file systems.
Schemata:
CREATE TABLE summaries (
log_id INTEGER,
fs_id INTEGER,
bytes_read INTEGER,
bytes_written INTEGER,
reads INTEGER,
writes INTEGER,
file_not_aligned INTEGER,
consec_reads INTEGER
consec_writes INTEGER,
mmaps INTEGER,
opens INTEGER,
seeks INTEGER,
stats INTEGER,
fdsyncs INTEGER,
fsyncs INTEGER,
seq_reads INTEGER,
seq_writes INTEGER,
rw_switches INTEGER,
f_close_start_timestamp REAL,
f_close_end_timestamp REAL,
f_open_start_timestamp REAL,
f_open_end_timestamp REAL,
f_read_start_timestamp REAL,
f_read_end_timestamp REAL,
f_write_start_timestamp REAL,
f_write_end_timestamp REAL,
FOREIGN KEY (fs_id) REFERENCES mounts (fs_id),
FOREIGN KEY (log_id) REFERENCES headers (log_id),
UNIQUE(log_id, fs_id)
);
CREATE TABLE mounts (
fs_id INTEGER PRIMARY KEY,
mountpt CHAR,
fsname CHAR
);
CREATE TABLE headers (
log_id INTEGER PRIMARY KEY,
filename CHAR UNIQUE,
end_time INTEGER,
exe CHAR,
exename CHAR,
jobid CHAR,
nprocs INTEGER,
start_time INTEGER,
uid INTEGER,
username CHAR,
log_version CHAR,
walltime INTEGER
);
-
tokio.cli.index_darshanlogs.
get_existing_logs
(conn)[source]¶ Returns list of log files already indexed in db
Scans the summaries table for existing entries and returns the file names corresponding to those entries. We don’t worry about summary rows that don’t correspond to existing header entries because the schema prevents this. Similarly, each log’s summaries are committed as a single transaction so we can assume that if a log file has _any_ rows represented in the summaries table, it has been fully processed and does not need to be updated.
Parameters: conn (sqlite3.Connection) – Connection to database containing existing logs Returns: Basenames of Darshan log files represnted in the database Return type: list of str
-
tokio.cli.index_darshanlogs.
get_file_mount
(filename, mount_list)[source]¶ Return the mount point in which a file is located
Parameters: - filename (str) – Fully equalified path to a file or directory
- mount_list (list of str) – List of mount points
Returns: - The member of mount_list in which filename
lives; first string is the mount point, and the second is the logical file system name. Returns None if filename does not match any mounts
Return type:
-
tokio.cli.index_darshanlogs.
index_darshanlogs
(log_list, output_file, threads=1, max_mb=0.0)[source]¶ Calculate the sum bytes read/written
Given a list of input files, parse each as a Darshan log in parallel to create a list of scalar summary values correspond to each log and insert these into an SQLite database.
Current implementation parses all logs and stores their index values in memory before beginning the database insert process. This can be memory-intensive if processing many millions of logs at once but avoids thread contention on the SQLite database.
Parameters: Returns: Reduced data along different reduction dimensions
Return type:
-
tokio.cli.index_darshanlogs.
init_mount_to_fsname
()[source]¶ Initialize regexes to map mount points to file system names
-
tokio.cli.index_darshanlogs.
process_log_list
(conn, log_list)[source]¶ Expand and filter the list of logs to process
Takes log_list as input by user and returns a list of Darshan logs that should be added to the index database. It does the following:
- Expands log_list from a single-element list pointing to a directory [of logs] into a list of log files
- Returns the subset of Darshan logs which do not already appear in the given database.
Relies on the logic of get_existing_logs() to determine whether a log appears in a database or not. If a database is somehow created where the summaries table is fully populated but the headers table is not, this will still return log files corresponding to the missing headers and potentially result in duplicate summaries entries that have no matching header.
Parameters: - conn (sqlite3.Connection) – Database containing log data
- log_list (list of str) – List of paths to Darshan logs or a single-element list to a directory
Returns: - Subset of log_list that contains only those Darshan logs
that are not already represented in the database referenced by conn.
Return type: list of str
-
tokio.cli.index_darshanlogs.
summarize_by_fs
(darshan_log, max_mb=0.0)[source]¶ Generates summary scalar values for a Darshan log
Parameters: Returns: - Contains three keys (summaries, mounts, and headers) whose values
are dicts of key-value pairs corresponding to scalar summary values from the POSIX module which are reduced over all files sharing a common mount point.
Return type:
-
tokio.cli.index_darshanlogs.
update_headers_table
(conn, header_data)[source]¶ Adds new header data to the headers table
-
tokio.cli.index_darshanlogs.
update_mount_table
(conn, mount_points)[source]¶ Adds new mount points to the mount table