PyTorch Training#

Getting Started#

Aiaccel-based training is a wrapper of PyTorch Lightning, which can be executed as follows:

python -m aiaccel.torch.apps.train config.yaml

The config file config.yaml typically consists of trainer, datamodule, and task as follows:

config.yaml#
 1 _base_: ${base_config_path}/train_base.yaml
 2
 3 trainer:
 4   max_epochs: 10
 5
 6   callbacks:
 7     - _target_: lightning.pytorch.callbacks.ModelCheckpoint
 8       filename: "{epoch:04d}"
 9       save_last: True
10       save_top_k: -1
11
12 datamodule:
13   _target_: aiaccel.torch.lightning.datamodules.SingleDataModule
14
15   train_dataset_fn:
16     _partial_: True
17     _target_: torchvision.datasets.MNIST
18
19     root: "./dataset"
20     train: True
21     download: True
22
23     transform:
24       _target_: torchvision.transforms.Compose
25       transforms:
26         - _target_: torchvision.transforms.Resize
27           size: [[256, 256]]
28         - _target_: torchvision.transforms.Grayscale
29           num_output_channels: 3
30         - _target_: torchvision.transforms.ToTensor
31         - _target_: torchvision.transforms.Normalize
32           mean: [0.5]
33           std: [0.5]
34
35   val_dataset_fn:
36     _partial_: True
37     _inherit_: ${datamodule.train_dataset_fn}
38
39     train: False
40
41   batch_size: 128
42   wrap_scatter_dataset: False
43
44 task:
45   _target_: my_task.MyTask
46   num_classes: 10
47
48   model:
49     _target_: torchvision.models.resnet50
50     weights:
51       _target_: hydra.utils.get_object
52       path: torchvision.models.ResNet50_Weights.DEFAULT
53
54   optimizer_config:
55     _target_: aiaccel.torch.lightning.OptimizerConfig
56     optimizer_generator:
57       _partial_: True
58       _target_: torch.optim.Adam
59       lr: 1.e-4

Distributed Training#

WIP…

Other Utilities#

Other utilities are listed in API Reference.