2. Training a Pytorch Lighning model

In this notebook, we show the training of a simple CNN model using Pytorch Lightning. We first start with data, then define the model, and finally train it for a HAR task.

Creating KuHar LightningDataModule

In order to train a model, we must first create a LightningDataModule, that will define the data loaders for training, validation and test. Here, we will use the Standartized KuHar data. Therefore, the data directory may looks like this:

KuHar/
    test.csv
    train.csv
    validation.csv

The train.csv file may look like this:

accel-x-0	accel-x-1	accel-y-0	accel-y-1	…	standard activity code
0.502123	0.02123	0.502123	0.502123	…	0
0.6820123	0.02123	0.502123	0.502123	…	0
0.498217	0.00001	1.414141	3.141592	…	1

As each CSV file contains windowed time signals of two 3-axial sensors, we may use the MultiModalSeriesCSVDataset class to handle this data structure. After it, we must create a LightningDataModule, that will define the data loaders for training, validation and test. The implementation of LightningDataModule may look like the snippet below:

import lightning as L
from torch.utils.data import DataLoader
from ssl_tools.data.datasets import MultiModalSeriesCSVDataset

class HARDataModule(L.LightningDataModule):
    def __init__(self, data_path: Path, batch_size: int):
        super().__init__()
        self.data_path = data_path
        self.batch_size = batch_size

    def train_dataloader(self):
        dataset = MultiModalSeriesCSVDataset(self.data_path / 'train.csv')
        return DataLoader(dataset, batch_size=self.batch_size, shuffle=True)

    ...

Faciliting the creation of the LightningDataModule with MultiModalHARSeriesDataModule

If your directory is organized like the one above, the CSVs are a collection of time-windows of signals, and the LightningDataModule implementation may looks like the one above, you can use the MultiModalHARSeriesDataModule to create a LightningDataModule easily for you. The train_dataloader method will use train.csv, val_dataloader will use validation.csv and test_dataloader will use test.csv to create the MultiModalSeriesCSVDataset and encapsulate into DataLoader.

To create a MultiModalHARSeriesDataModule, we must pass:

data_path: the path to the directory containing the CSV files (train.csv, validation.csv and test.csv). We use standardized_balanced/KuHar in this case;
feature_prefixes: the prefixes of the features in the CSV files. In this case, we have accel-x, accel-y, accel-z, gyro-x, gyro-y and gyro-z;
batch_size: the batch size for the data loaders; and
num_workers: the number of workers for the data loaders. Essentially, the number of parallel processes to load the data.

All data loader will share the passed parameters, such as batch_size, num_workers, and feature_prefixes.

[1]:

from ssl_tools.data.data_modules.har import MultiModalHARSeriesDataModule

data_path = "/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar/"

data_module = MultiModalHARSeriesDataModule(
    data_path=data_path,
    feature_prefixes=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
    label="standard activity code",
    features_as_channels=True,
    batch_size=64,
)
data_module

[1]:

MultiModalHARSeriesDataModule(data_path=/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar, batch_size=64)

We can test the dataloaders by getting the first batch of each one. Let’s do it (only fortrain_dataloader)!.

NOTE: We use the data_module.train_dataloader() method to get the data loader for the training set. Note that the .setup() method must be called before getting the data loaders. If you don’t call it, the data loaders will not be created. However, when used to train a model, the Pytorch Lightning Trainer.fit() method will automatically call the .setup() method for you. So, we put it here just to show how to fetch a data from train_dataloader and check if it is working.

[4]:

data_module.setup("fit")            # We just put it here to test.
                                    #   When training a model, the Trainer will
                                    #   call this method.

train_dataloader = data_module.train_dataloader()

# Pick the first batch to inspect. As batch size is 64, we will have 64 samples.
# Note that dataloader only implement iterator protocol,
#   so we can use next() to fetch one batch.
batch = next(iter(train_dataloader))
# Each batch is a 2-element tuple:
#   First element is a Tensor with 64 input samples
#   and the second is a Tensor with 64 labels.
inputs, targets = batch

# (B, C, T) = (Batch size, Channels, Time steps) = (64, 6, 60)
print(f"Inputs shape: {inputs.shape}, Targets shape: {targets.shape}")

Inputs shape: torch.Size([64, 6, 60]), Targets shape: torch.Size([64])

Training a simple model

We will create a simple 1D CNN Pytorch Lightning model using the Simple1DConvNetwork. The model will be trained to classify the activities in KuHar dataset.

Pytorch Lightning models must implement the forward method, training_step and configure_optimizers methods. Also, the __init__ method is used to define the model. The forward method is the same as the Pytorch forward method. The training_step method is the method that will be called for each batch of data during the training. It should return the loss of the batch. The configure_optimizers method is the method that will define the optimizer to be used during the training.

The Simple1DConvNetwork is a simple 1D CNN model, that has 3 convolutional layers and 2 fully connected layers. It is trained using the Adam optimizer and the CrossEntropyLoss loss function.

Besides that, Lightning models implemented in this framework, usually logs the training and validation losses.

[5]:

from ssl_tools.models.nets.convnet import Simple1DConvNetwork

model = Simple1DConvNetwork(
    input_shape=(6,60), # (The number of input channels, input size of FC layers)
    num_classes=6,      # The number of output classes
    learning_rate=1e-3, # The learning rate of the Adam optimizer
)

model

[5]:

Simple1DConvNetwork(
  (loss_func): CrossEntropyLoss()
  (features): Sequential(
    (0): Conv1d(6, 64, kernel_size=(5,), stride=(1,))
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Conv1d(64, 64, kernel_size=(5,), stride=(1,))
    (4): ReLU()
    (5): Dropout(p=0.5, inplace=False)
    (6): Conv1d(64, 64, kernel_size=(5,), stride=(1,))
    (7): ReLU()
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=3072, out_features=128, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=128, out_features=6, bias=True)
  )
)

To train a Lightning model using Pytorch Lightning, we must create a Trainer and call the fit method. The Trainer is responsible for training the model. It has several parameters, such as the number of epochs, the number of GPUs/CPUs to use, etc.

We will train our model using the already defined dataloader. The fit method will be responsible for training the model using the training and validation data loaders. After training, we will test the model using the test data loader and Trainer’s test method.

Here, the training will run for 300 epochs (max_epochs) and will use only 1 (devices) GPU (accelerator).

[6]:

import lightning as L

trainer = L.Trainer(
    max_epochs=300,
    accelerator="gpu",
    devices=1,
    strategy="auto",
    num_nodes=1
)
trainer.fit(model, data_module)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name       | Type             | Params
------------------------------------------------
0 | loss_func  | CrossEntropyLoss | 0
1 | features   | Sequential       | 43.1 K
2 | classifier | Sequential       | 394 K
------------------------------------------------
437 K     Trainable params
0         Non-trainable params
437 K     Total params
1.749     Total estimated model params size (MB)

/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py:293: The number of training batches (22) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

`Trainer.fit` stopped: `max_epochs=300` reached.

Testing the model

Using the test set

Once the model is trained, we can test the model using the test data loader using the test method and passing the data module. The test method will setup and use the test_dataloader in from the data module to test the model and print the test loss.

Note that the return of the test method is a list of dictionaries containing the test loss and the test accuracy for each dataloader (just 1, in our case).

[7]:

trainer.test(model, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test_acc          │    0.8333333134651184     │
│         test_loss         │    1.9901254177093506     │
└───────────────────────────┴───────────────────────────┘

[7]:

[{'test_loss': 1.9901254177093506, 'test_acc': 0.8333333134651184}]

Using any other set from data module

And if we want to test the model using the validation data loader, we also can use the trainer.test method, but passing the val_dataloader. Remember that as we are not passing a LightningDataModule to the test method, but a DataLoader, we must call setup method.

[8]:

data_module.setup("fit")
validation_dataloader = data_module.val_dataloader()
trainer.test(model, validation_dataloader)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test_acc          │    0.5962441563606262     │
│         test_loss         │    14.916933059692383     │
└───────────────────────────┴───────────────────────────┘

[8]:

[{'test_loss': 14.916933059692383, 'test_acc': 0.5962441563606262}]