aiaccel.torch.lightning package

Subpackages

Submodules

aiaccel.torch.lightning.abci_environment module

class aiaccel.torch.lightning.abci_environment.ABCIEnvironment[source]

Bases: ClusterEnvironment

Environment class for ABCI.

This class provides methods to interact with the ABCI environment, such as retrieving the world size, global rank, node rank, and local rank.

property creates_processes_externally: bool

Whether the environment creates the subprocesses or not.

static detect() bool[source]

Detects the environment settings corresponding to this cluster and returns True if they match.

global_rank() int[source]

The rank (index) of the currently running process across all nodes and devices.

local_rank() int[source]

The rank (index) of the currently running process inside of the current node.

property main_address: str

The main address through which all processes connect and communicate.

property main_port: int

An open and configured port in the main node through which all processes communicate.

node_rank() int[source]

The rank (index) of the node on which the current process runs.

set_global_rank(rank: int) None[source]
set_world_size(size: int) None[source]
validate_settings(num_devices: int, num_nodes: int) None[source]

Validates settings configured in the script against the environment, and raises an exception if there is an inconsistency.

world_size() int[source]

The number of processes across all devices and nodes.

aiaccel.torch.lightning.callback module

class aiaccel.torch.lightning.callback.SaveMetricCallback(metric_name: str, output_path: str)[source]

Bases: Callback

Lightning Callback for save metric in fit ends.

Parameters:
  • metric_name (str) – Metric name to save

  • output_path (str) – File name to save

on_fit_end(trainer: Trainer, pl_module: LightningModule) None[source]

Called when fit ends.

aiaccel.torch.lightning.opt_lightning_module module

class aiaccel.torch.lightning.opt_lightning_module.OptimizerConfig(optimizer_generator: Callable[..., optim.Optimizer], params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None, scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None, scheduler_interval: str | None = 'step', scheduler_monitor: str | None = 'validation/loss')[source]

Bases: object

Configuration for the optimizer and scheduler in a LightningModule.

Parameters:
  • optimizer_generator (Callable[..., optim.Optimizer]) – A callable that generates the optimizer.

  • params_transformer (Callable[..., Iterator[tuple[str, Any]]] | None) – A callable that transforms the parameters into a format suitable for the optimizer. If None, the parameters are used as is. Defaults to None.

  • scheduler_generator (Callable[..., optim.lr_scheduler.LRScheduler] | None) – A callable that generates the learning rate scheduler. If None, no scheduler is used. Defaults to None.

  • scheduler_interval (str | None) – The interval at which the scheduler is called. Defaults to “step”.

  • scheduler_monitor (str | None) – The metric to monitor for the scheduler. Defaults to “validation/loss”.

optimizer_generator: Callable[..., optim.Optimizer]
params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None
scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None
scheduler_interval: str | None = 'step'
scheduler_monitor: str | None = 'validation/loss'
class aiaccel.torch.lightning.opt_lightning_module.OptimizerLightningModule(optimizer_config: OptimizerConfig)[source]

Bases: LightningModule

LightningModule subclass for models that use custom optimizers and schedulers.

Parameters:

optimizer_config (OptimizerConfig) – Configuration object for the optimizer.

optcfg

Configuration object for the optimizer.

Type:

OptimizerConfig

configure_optimizers()[source]

Configures the optimizer and scheduler for training.

configure_optimizers() optim.Optimizer | OptimizerLRSchedulerConfig[source]

Configures the optimizer and scheduler for training.

Returns:

The optimizer and scheduler configuration.

Return type:

Union[optim.Optimizer, OptimizerLRSchedulerConfig]

aiaccel.torch.lightning.opt_lightning_module.build_param_groups(named_params: Iterator[tuple[str, Parameter]], groups: list[dict[str, Any]]) list[dict[str, Any]][source]

Build parameter groups for the optimizer based on the provided patterns.

Parameters:
  • named_params (Iterator[tuple[str, nn.Parameter]]) – An iterator of named parameters.

  • groups (list[dict[str, Any]]) – A list of dictionaries where each dictionary contains a “pattern” key that specifies the parameter names to match (fnmatch), and other optional keys.

Example: In your config file, you might have:

optimizer_config:
  _target_: aiaccel.torch.lightning.OptimizerConfig
  optimizer_generator:
    _partial_: True
    _target_: torch.optim.AdamW
    weight_decay: 0.01
  params_transformer:
      _partial_: True
      _target_: aiaccel.torch.lightning.build_param_groups
      groups:
        - pattern: "*bias"
          lr: 0.01
        - pattern: "*weight"
          lr: 0.001

This will create two parameter groups: one for biases with a learning rate of 0.01 and another for weights with a learning rate of 0.001.

Module contents

class aiaccel.torch.lightning.ABCIEnvironment[source]

Bases: ClusterEnvironment

Environment class for ABCI.

This class provides methods to interact with the ABCI environment, such as retrieving the world size, global rank, node rank, and local rank.

property creates_processes_externally: bool

Whether the environment creates the subprocesses or not.

static detect() bool[source]

Detects the environment settings corresponding to this cluster and returns True if they match.

global_rank() int[source]

The rank (index) of the currently running process across all nodes and devices.

local_rank() int[source]

The rank (index) of the currently running process inside of the current node.

property main_address: str

The main address through which all processes connect and communicate.

property main_port: int

An open and configured port in the main node through which all processes communicate.

node_rank() int[source]

The rank (index) of the node on which the current process runs.

set_global_rank(rank: int) None[source]
set_world_size(size: int) None[source]
validate_settings(num_devices: int, num_nodes: int) None[source]

Validates settings configured in the script against the environment, and raises an exception if there is an inconsistency.

world_size() int[source]

The number of processes across all devices and nodes.

class aiaccel.torch.lightning.OptimizerConfig(optimizer_generator: Callable[..., optim.Optimizer], params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None, scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None, scheduler_interval: str | None = 'step', scheduler_monitor: str | None = 'validation/loss')[source]

Bases: object

Configuration for the optimizer and scheduler in a LightningModule.

Parameters:
  • optimizer_generator (Callable[..., optim.Optimizer]) – A callable that generates the optimizer.

  • params_transformer (Callable[..., Iterator[tuple[str, Any]]] | None) – A callable that transforms the parameters into a format suitable for the optimizer. If None, the parameters are used as is. Defaults to None.

  • scheduler_generator (Callable[..., optim.lr_scheduler.LRScheduler] | None) – A callable that generates the learning rate scheduler. If None, no scheduler is used. Defaults to None.

  • scheduler_interval (str | None) – The interval at which the scheduler is called. Defaults to “step”.

  • scheduler_monitor (str | None) – The metric to monitor for the scheduler. Defaults to “validation/loss”.

optimizer_generator: Callable[..., optim.Optimizer]
params_transformer: Callable[..., Iterator[tuple[str, Any]]] | None = None
scheduler_generator: Callable[..., optim.lr_scheduler.LRScheduler] | None = None
scheduler_interval: str | None = 'step'
scheduler_monitor: str | None = 'validation/loss'
class aiaccel.torch.lightning.OptimizerLightningModule(optimizer_config: OptimizerConfig)[source]

Bases: LightningModule

LightningModule subclass for models that use custom optimizers and schedulers.

Parameters:

optimizer_config (OptimizerConfig) – Configuration object for the optimizer.

optcfg

Configuration object for the optimizer.

Type:

OptimizerConfig

configure_optimizers()[source]

Configures the optimizer and scheduler for training.

configure_optimizers() optim.Optimizer | OptimizerLRSchedulerConfig[source]

Configures the optimizer and scheduler for training.

Returns:

The optimizer and scheduler configuration.

Return type:

Union[optim.Optimizer, OptimizerLRSchedulerConfig]

aiaccel.torch.lightning.build_param_groups(named_params: Iterator[tuple[str, Parameter]], groups: list[dict[str, Any]]) list[dict[str, Any]][source]

Build parameter groups for the optimizer based on the provided patterns.

Parameters:
  • named_params (Iterator[tuple[str, nn.Parameter]]) – An iterator of named parameters.

  • groups (list[dict[str, Any]]) – A list of dictionaries where each dictionary contains a “pattern” key that specifies the parameter names to match (fnmatch), and other optional keys.

Example: In your config file, you might have:

optimizer_config:
  _target_: aiaccel.torch.lightning.OptimizerConfig
  optimizer_generator:
    _partial_: True
    _target_: torch.optim.AdamW
    weight_decay: 0.01
  params_transformer:
      _partial_: True
      _target_: aiaccel.torch.lightning.build_param_groups
      groups:
        - pattern: "*bias"
          lr: 0.01
        - pattern: "*weight"
          lr: 0.001

This will create two parameter groups: one for biases with a learning rate of 0.01 and another for weights with a learning rate of 0.001.