aiaccel.torch.lightning package¶

Subpackages¶

aiaccel.torch.lightning.datamodules package

Submodules¶

aiaccel.torch.lightning.abci_environment module¶

class aiaccel.torch.lightning.abci_environment.ABCIEnvironment[source]¶

Bases: ClusterEnvironment

Environment class for ABCI.

This class provides methods to interact with the ABCI environment, such as retrieving the world size, global rank, node rank, and local rank.

property creates_processes_externally: bool¶: Whether the environment creates the subprocesses or not.

static detect() → bool[source]¶: Detects the environment settings corresponding to this cluster and returns True if they match.

global_rank() → int[source]¶: The rank (index) of the currently running process across all nodes and devices.

local_rank() → int[source]¶: The rank (index) of the currently running process inside of the current node.

property main_address: str¶: The main address through which all processes connect and communicate.

property main_port: int¶: An open and configured port in the main node through which all processes communicate.

node_rank() → int[source]¶: The rank (index) of the node on which the current process runs.

set_global_rank(rank: int) → None[source]¶

set_world_size(size: int) → None[source]¶

validate_settings(num_devices: int, num_nodes: int) → None[source]¶: Validates settings configured in the script against the environment, and raises an exception if there is an inconsistency.

world_size() → int[source]¶: The number of processes across all devices and nodes.

aiaccel.torch.lightning.callback module¶

class aiaccel.torch.lightning.callback.SaveMetricCallback(metric_name: str, output_path: str)[source]¶

Bases: Callback

Lightning Callback for save metric in fit ends.

Parameters:

metric_name (str) – Metric name to save
output_path (str) – File name to save

on_fit_end(trainer: Trainer, pl_module: LightningModule) → None[source]¶: Called when fit ends.

aiaccel.torch.lightning.opt_lightning_module module¶

class aiaccel.torch.lightning.opt_lightning_module.OptimizerConfig(optimizer_generator: Callable[..., optim.Optimizer], params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None, scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None, scheduler_interval: str | None = 'step', scheduler_monitor: str | None = 'validation/loss')[source]¶

Bases: object

Configuration for the optimizer and scheduler in a LightningModule.

Parameters:

optimizer_generator (Callable[..., optim.Optimizer]) – A callable that generates the optimizer.
params_transformer (Callable[..., Iterator[tuple[str, Any]]] | None) – A callable that transforms the parameters into a format suitable for the optimizer. If None, the parameters are used as is. Defaults to None.
scheduler_generator (Callable[..., optim.lr_scheduler.LRScheduler] | None) – A callable that generates the learning rate scheduler. If None, no scheduler is used. Defaults to None.
scheduler_interval (str | None) – The interval at which the scheduler is called. Defaults to “step”.
scheduler_monitor (str | None) – The metric to monitor for the scheduler. Defaults to “validation/loss”.

optimizer_generator: Callable[..., optim.Optimizer]¶

params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None¶

scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None¶

scheduler_interval: str | None = 'step'¶

scheduler_monitor: str | None = 'validation/loss'¶

class aiaccel.torch.lightning.opt_lightning_module.OptimizerLightningModule(optimizer_config: OptimizerConfig)[source]¶

Bases: LightningModule

LightningModule subclass for models that use custom optimizers and schedulers.

Parameters:: optimizer_config (OptimizerConfig) – Configuration object for the optimizer.

optcfg¶

Configuration object for the optimizer.

Type:: OptimizerConfig

configure_optimizers()[source]¶: Configures the optimizer and scheduler for training.

configure_optimizers() → optim.Optimizer | OptimizerLRSchedulerConfig[source]¶

Configures the optimizer and scheduler for training.

Returns:: The optimizer and scheduler configuration.
Return type:: Union[optim.Optimizer, OptimizerLRSchedulerConfig]

aiaccel.torch.lightning.opt_lightning_module.build_param_groups(named_params: Iterator[tuple[str, Parameter]], groups: list[dict[str, Any]]) → list[dict[str, Any]][source]¶

Build parameter groups for the optimizer based on the provided patterns.

Parameters:

named_params (Iterator[tuple[str, nn.Parameter]]) – An iterator of named parameters.
groups (list[dict[str, Any]]) – A list of dictionaries where each dictionary contains a “pattern” key that specifies the parameter names to match (fnmatch), and other optional keys.

Example: In your config file, you might have:

optimizer_config:
  _target_: aiaccel.torch.lightning.OptimizerConfig
  optimizer_generator:
    _partial_: True
    _target_: torch.optim.AdamW
    weight_decay: 0.01
  params_transformer:
      _partial_: True
      _target_: aiaccel.torch.lightning.build_param_groups
      groups:
        - pattern: "*bias"
          lr: 0.01
        - pattern: "*weight"
          lr: 0.001

This will create two parameter groups: one for biases with a learning rate of 0.01 and another for weights with a learning rate of 0.001.

Module contents¶

class aiaccel.torch.lightning.ABCIEnvironment[source]¶

Bases: ClusterEnvironment

Environment class for ABCI.

This class provides methods to interact with the ABCI environment, such as retrieving the world size, global rank, node rank, and local rank.

property creates_processes_externally: bool¶: Whether the environment creates the subprocesses or not.

static detect() → bool[source]¶: Detects the environment settings corresponding to this cluster and returns True if they match.

global_rank() → int[source]¶: The rank (index) of the currently running process across all nodes and devices.

local_rank() → int[source]¶: The rank (index) of the currently running process inside of the current node.

property main_address: str¶: The main address through which all processes connect and communicate.

property main_port: int¶: An open and configured port in the main node through which all processes communicate.

node_rank() → int[source]¶: The rank (index) of the node on which the current process runs.

set_global_rank(rank: int) → None[source]¶

set_world_size(size: int) → None[source]¶

validate_settings(num_devices: int, num_nodes: int) → None[source]¶: Validates settings configured in the script against the environment, and raises an exception if there is an inconsistency.

world_size() → int[source]¶: The number of processes across all devices and nodes.

class aiaccel.torch.lightning.OptimizerConfig(optimizer_generator: Callable[..., optim.Optimizer], params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None, scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None, scheduler_interval: str | None = 'step', scheduler_monitor: str | None = 'validation/loss')[source]¶

Bases: object

Configuration for the optimizer and scheduler in a LightningModule.

Parameters:

optimizer_generator (Callable[..., optim.Optimizer]) – A callable that generates the optimizer.
params_transformer (Callable[..., Iterator[tuple[str, Any]]] | None) – A callable that transforms the parameters into a format suitable for the optimizer. If None, the parameters are used as is. Defaults to None.
scheduler_generator (Callable[..., optim.lr_scheduler.LRScheduler] | None) – A callable that generates the learning rate scheduler. If None, no scheduler is used. Defaults to None.
scheduler_interval (str | None) – The interval at which the scheduler is called. Defaults to “step”.
scheduler_monitor (str | None) – The metric to monitor for the scheduler. Defaults to “validation/loss”.

optimizer_generator: Callable[..., optim.Optimizer]¶

params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None¶

scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None¶

scheduler_interval: str | None = 'step'¶

scheduler_monitor: str | None = 'validation/loss'¶

class aiaccel.torch.lightning.OptimizerLightningModule(optimizer_config: OptimizerConfig)[source]¶

Bases: LightningModule

LightningModule subclass for models that use custom optimizers and schedulers.

Parameters:: optimizer_config (OptimizerConfig) – Configuration object for the optimizer.

optcfg¶

Configuration object for the optimizer.

Type:: OptimizerConfig

configure_optimizers()[source]¶: Configures the optimizer and scheduler for training.

configure_optimizers() → optim.Optimizer | OptimizerLRSchedulerConfig[source]¶

Configures the optimizer and scheduler for training.

Returns:: The optimizer and scheduler configuration.
Return type:: Union[optim.Optimizer, OptimizerLRSchedulerConfig]

aiaccel.torch.lightning.build_param_groups(named_params: Iterator[tuple[str, Parameter]], groups: list[dict[str, Any]]) → list[dict[str, Any]][source]¶

Build parameter groups for the optimizer based on the provided patterns.

Parameters:

named_params (Iterator[tuple[str, nn.Parameter]]) – An iterator of named parameters.
groups (list[dict[str, Any]]) – A list of dictionaries where each dictionary contains a “pattern” key that specifies the parameter names to match (fnmatch), and other optional keys.

Example: In your config file, you might have:

optimizer_config:
  _target_: aiaccel.torch.lightning.OptimizerConfig
  optimizer_generator:
    _partial_: True
    _target_: torch.optim.AdamW
    weight_decay: 0.01
  params_transformer:
      _partial_: True
      _target_: aiaccel.torch.lightning.build_param_groups
      groups:
        - pattern: "*bias"
          lr: 0.01
        - pattern: "*weight"
          lr: 0.001

This will create two parameter groups: one for biases with a learning rate of 0.01 and another for weights with a learning rate of 0.001.