wandb
- class aitoolbox.torchtrain.callbacks.wandb.AlertConfig(metric_name: str, threshold_value: float, objective: str = 'maximize', wandb_alert_level: wandb.sdk.wandb_alerts.AlertLevel = None)[source]
Bases:
object
- wandb_alert_level: AlertLevel = None
- class aitoolbox.torchtrain.callbacks.wandb.WandBTracking(metric_names=None, batch_log_frequency=None, hyperparams=None, tags=None, alerts=None, wandb_pre_initialized=False, source_dirs=(), log_dir=None, is_project=True, project_name=None, experiment_name=None, local_model_result_folder_path=None, **kwargs)[source]
Bases:
AbstractExperimentCallback
Weights And Biases Logger
Find more on: https://wandb.ai
Note
Before this callback can be used you need to have wandb account and be credentialed on the machine. Instructions for this process can be found on wandb GitHub: https://github.com/wandb/client
- Parameters:
metric_names (list or None) – list of metric names tracked in the training history. If left to
None
, all the metrics in the training history will be logged.batch_log_frequency (int or None) – frequency of logging. If set to None batch level logging is skipped. Instead of also mid-epoch logging only end-of-epoch logging is executed.
hyperparams (dict or None) – dictionary of used hyperparameters. If set to None the callback tries to find the hyperparameter dict in the encapsulating TrainLoop running the callback.
tags (list or None) – used for wandb init. From wandb documentation: A list of strings, which will populate the list of tags on this run in the UI. Tags are useful for organizing runs together, or applying temporary labels like “baseline” or “production”. It’s easy to add and remove tags in the UI, or filter down to just runs with a specific tag.
alerts (list[AlertConfig] or None) – list of alerts where each alert configuration is specified as an AlertConfig dataclass. User should provide the
metric_name
based on which the alert should be triggered. The last calculated value of the metric is then compared with the providedthreshold_value
. Theobjective
can be either “maximize” or “minimize”.wandb_pre_initialized (bool) – if wandb has been initialized already outside the callback (e.g. at the start of the experiment script). If not, the callback initializes the wandb process.
source_dirs (tuple or list) – list of source code directories which will be stored by wandb. If empty list is given the callback will try to get this information from the running TrainLoop. If this is also not available the callback leaves wandb default code saving operation which saves the execution python script.
log_dir (str or None) – save directory location
is_project (bool) – set to
True
if the wandb project folder should be placed into the TrainLoop-created project folder structure or toFalse
if you want to save into a specific full path given in the log_dir parameter.project_name (str or None) – root name of the project
experiment_name (str or None) – name of the particular experiment
local_model_result_folder_path (str or None) – root local path where project folder will be created
**kwargs – additional arguments for
wandb.init()
wrapped inside this callback
- log_mid_train_loss()[source]
Log the training loss at the batch iteration level
Logs current batch loss and the accumulated average loss.
- Returns:
None
- log_train_history_metrics(metric_names)[source]
Log the train history metrics at the end of the epoch
- Parameters:
metric_names (list) – list of train history tracked metrics to be logged
- Returns:
None
- static send_configured_alerts(alerts, metrics_log)[source]
Send wandb alerts
Sending of alerts depends on current metric values in the
metrics_log
satisfying the conditions specified in the alert configuration.- Parameters:
alerts (list[AlertConfig]) – list of alerts where each alert configuration is specified as an AlertConfig dataclass. User should provide the
metric_name
based on which the alert should be triggered. The last calculated value of the metric is then compared with the providedthreshold_value
. Theobjective
can be either “maximize” or “minimize”.metrics_log (dict) – dict of metrics names and their corresponding current values.
- Returns:
None