2. Training a Pytorch Lighning model
In this notebook, we show the training of a simple CNN model using Pytorch Lightning. We first start with data, then define the model, and finally train it for a HAR task.
Creating KuHar LightningDataModule
In order to train a model, we must first create a LightningDataModule
, that will define the data loaders for training, validation and test. Here, we will use the Standartized KuHar data. Therefore, the data directory may looks like this:
KuHar/
test.csv
train.csv
validation.csv
The train.csv
file may look like this:
accel-x-0 |
accel-x-1 |
accel-y-0 |
accel-y-1 |
… |
standard activity code |
---|---|---|---|---|---|
0.502123 |
0.02123 |
0.502123 |
0.502123 |
… |
0 |
0.6820123 |
0.02123 |
0.502123 |
0.502123 |
… |
0 |
0.498217 |
0.00001 |
1.414141 |
3.141592 |
… |
1 |
As each CSV file contains windowed time signals of two 3-axial sensors, we may use the MultiModalSeriesCSVDataset
class to handle this data structure. After it, we must create a LightningDataModule
, that will define the data loaders for training, validation and test. The implementation of LightningDataModule
may look like the snippet below:
import lightning as L
from torch.utils.data import DataLoader
from ssl_tools.data.datasets import MultiModalSeriesCSVDataset
class HARDataModule(L.LightningDataModule):
def __init__(self, data_path: Path, batch_size: int):
super().__init__()
self.data_path = data_path
self.batch_size = batch_size
def train_dataloader(self):
dataset = MultiModalSeriesCSVDataset(self.data_path / 'train.csv')
return DataLoader(dataset, batch_size=self.batch_size, shuffle=True)
...
Faciliting the creation of the LightningDataModule with MultiModalHARSeriesDataModule
If your directory is organized like the one above, the CSVs are a collection of time-windows of signals, and the LightningDataModule
implementation may looks like the one above, you can use the MultiModalHARSeriesDataModule
to create a LightningDataModule
easily for you. The train_dataloader
method will use train.csv
, val_dataloader
will use validation.csv
and test_dataloader
will use test.csv
to create the MultiModalSeriesCSVDataset
and encapsulate into
DataLoader
.
To create a MultiModalHARSeriesDataModule
, we must pass:
data_path
: the path to the directory containing the CSV files (train.csv
,validation.csv
andtest.csv
). We usestandardized_balanced/KuHar
in this case;feature_prefixes
: the prefixes of the features in the CSV files. In this case, we haveaccel-x
,accel-y
,accel-z
,gyro-x
,gyro-y
andgyro-z
;batch_size
: the batch size for the data loaders; andnum_workers
: the number of workers for the data loaders. Essentially, the number of parallel processes to load the data.
All data loader will share the passed parameters, such as batch_size
, num_workers
, and feature_prefixes
.
[1]:
from ssl_tools.data.data_modules.har import MultiModalHARSeriesDataModule
data_path = "/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar/"
data_module = MultiModalHARSeriesDataModule(
data_path=data_path,
feature_prefixes=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
label="standard activity code",
features_as_channels=True,
batch_size=64,
)
data_module
[1]:
MultiModalHARSeriesDataModule(data_path=/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar, batch_size=64)
We can test the dataloaders by getting the first batch of each one. Let’s do it (only fortrain_dataloader
)!.
NOTE: We use the data_module.train_dataloader() method to get the data loader for the training set. Note that the
.setup()
method must be called before getting the data loaders. If you don’t call it, the data loaders will not be created. However, when used to train a model, the Pytorch LightningTrainer.fit()
method will automatically call the.setup()
method for you. So, we put it here just to show how to fetch a data fromtrain_dataloader
and check if it is working.
[4]:
data_module.setup("fit") # We just put it here to test.
# When training a model, the Trainer will
# call this method.
train_dataloader = data_module.train_dataloader()
# Pick the first batch to inspect. As batch size is 64, we will have 64 samples.
# Note that dataloader only implement iterator protocol,
# so we can use next() to fetch one batch.
batch = next(iter(train_dataloader))
# Each batch is a 2-element tuple:
# First element is a Tensor with 64 input samples
# and the second is a Tensor with 64 labels.
inputs, targets = batch
# (B, C, T) = (Batch size, Channels, Time steps) = (64, 6, 60)
print(f"Inputs shape: {inputs.shape}, Targets shape: {targets.shape}")
Inputs shape: torch.Size([64, 6, 60]), Targets shape: torch.Size([64])
Training a simple model
We will create a simple 1D CNN Pytorch Lightning model using the Simple1DConvNetwork
. The model will be trained to classify the activities in KuHar dataset.
Pytorch Lightning models must implement the forward
method, training_step
and configure_optimizers
methods. Also, the __init__
method is used to define the model. The forward
method is the same as the Pytorch forward
method. The training_step
method is the method that will be called for each batch of data during the training. It should return the loss of the batch. The configure_optimizers
method is the method that will define the optimizer to be used during the
training.
The Simple1DConvNetwork
is a simple 1D CNN model, that has 3 convolutional layers and 2 fully connected layers. It is trained using the Adam
optimizer and the CrossEntropyLoss
loss function.
Besides that, Lightning models implemented in this framework, usually logs the training and validation losses.
[5]:
from ssl_tools.models.nets.convnet import Simple1DConvNetwork
model = Simple1DConvNetwork(
input_shape=(6,60), # (The number of input channels, input size of FC layers)
num_classes=6, # The number of output classes
learning_rate=1e-3, # The learning rate of the Adam optimizer
)
model
[5]:
Simple1DConvNetwork(
(loss_func): CrossEntropyLoss()
(features): Sequential(
(0): Conv1d(6, 64, kernel_size=(5,), stride=(1,))
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Conv1d(64, 64, kernel_size=(5,), stride=(1,))
(4): ReLU()
(5): Dropout(p=0.5, inplace=False)
(6): Conv1d(64, 64, kernel_size=(5,), stride=(1,))
(7): ReLU()
)
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=3072, out_features=128, bias=True)
(2): ReLU()
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=128, out_features=6, bias=True)
)
)
To train a Lightning model using Pytorch Lightning, we must create a Trainer
and call the fit
method. The Trainer
is responsible for training the model. It has several parameters, such as the number of epochs, the number of GPUs/CPUs to use, etc.
We will train our model using the already defined dataloader. The fit
method will be responsible for training the model using the training and validation data loaders. After training, we will test the model using the test data loader and Trainer’s test
method.
Here, the training will run for 300 epochs (max_epochs
) and will use only 1 (devices
) GPU (accelerator
).
[6]:
import lightning as L
trainer = L.Trainer(
max_epochs=300,
accelerator="gpu",
devices=1,
strategy="auto",
num_nodes=1
)
trainer.fit(model, data_module)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
| Name | Type | Params
------------------------------------------------
0 | loss_func | CrossEntropyLoss | 0
1 | features | Sequential | 43.1 K
2 | classifier | Sequential | 394 K
------------------------------------------------
437 K Trainable params
0 Non-trainable params
437 K Total params
1.749 Total estimated model params size (MB)
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py:293: The number of training batches (22) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
`Trainer.fit` stopped: `max_epochs=300` reached.
Testing the model
Using the test set
Once the model is trained, we can test the model using the test data loader using the test
method and passing the data module. The test
method will setup and use the test_dataloader
in from the data module to test the model and print the test loss.
Note that the return of the test
method is a list of dictionaries containing the test loss and the test accuracy for each dataloader (just 1, in our case).
[7]:
trainer.test(model, data_module)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_acc │ 0.8333333134651184 │ │ test_loss │ 1.9901254177093506 │ └───────────────────────────┴───────────────────────────┘
[7]:
[{'test_loss': 1.9901254177093506, 'test_acc': 0.8333333134651184}]
Using any other set from data module
And if we want to test the model using the validation data loader, we also can use the trainer.test
method, but passing the val_dataloader
. Remember that as we are not passing a LightningDataModule
to the test
method, but a DataLoader
, we must call setup
method.
[8]:
data_module.setup("fit")
validation_dataloader = data_module.val_dataloader()
trainer.test(model, validation_dataloader)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_acc │ 0.5962441563606262 │ │ test_loss │ 14.916933059692383 │ └───────────────────────────┴───────────────────────────┘
[8]:
[{'test_loss': 14.916933059692383, 'test_acc': 0.5962441563606262}]