3. Training a self-supervised model: Contrastive Predictive Coding (CPC)

In this notebook, we will train a self-supervised model using the Contrastive Predictive Coding (CPC) method. This method is based on the idea of predicting future tokens in a sequence, and it has been shown to be very effective in learning useful representations for downstream tasks. This framework already provides an implementation of CPC, so we will use it to train the model.

We will pre-train the model using KuHar dataset, and then we will use the learned representations to train a classifier for the downstream task (fine tuning). For both stages of training, as the last notebook, we will:

Create a Dataset and then LightningDataModule to load the data;
Instantiate the CPC model; and
Train the model using PyTorch Lightning.

Every SSL model in this framework can instantiate in two ways:

Instantiate each element, such as the encoder, the autoregressive model, and the CPC model, and then pass them to the CPC model; or
Using builder methods to instantiate the model. In this case, we do not need to instantiate each element separately, but we can still customize the model by passing the desired parameters to the builder methods. This is the approach we will use in this notebook.

In summary, the second approach encapsulates the first one, making it easier to use and it is more convenient for our purposes.

Pre-training the model

We will pre-train the model using the KuHar dataset. CPC is a self-supervised method, so we do not need labels to train the model. However, CPC assumes that the input data is sequential, that is, an input is a sequence of time-steps comprising different acitivities. Thus, for HAR, usually, one sample is a multi-modal time-series correspond to the whole time-series of a single user.

Creating the LightningDataModule

Our dataset must be organized in the following way:

data/
    train/
        user1.csv
        user2.csv
        ...
    validation/
        user4.csv
        user5.csv
        ...
    test/
        user6.csv
        user7.csv
        ...

And the content of each CSV file should be something like:

timestamp	accel-x	accel-y	accel-z	gyro-x	gyro-y	gyro-z	activity
0	0.1	0.2	0.3	0.4	0.5	0.6	0
1	0.2	0.3	0.4	0.5	0.6	0.7	0
…	…	…	…	…	…	…	…

Where timestamp is the time-stamp of the sample, accel-x, accel-y, accel-z, gyro-x, gyro-y, and gyro-z are the features of the sample, and activity is the label of the time-step.

In this way, we should use the SeriesFolderCSVDataset to load the data. This will create a Dataset for us, where each CSV file is a sample, and each row of the CSV file is a time-step, and the columns are the features.

If your data is organized as above, where inside the root folder (data/ in this case) there are sub-folders for each split (train/, validation/, and test/), and inside each split folder there are the CSV files, you can use the UserActivityFolderDataModule to create a LightningDataModule for you. This class will create DataLoader of SeriesFolderCSVDataset for each split (train, validation, and test), and will setup data correctly.

In this notebook, we will use the UserActivityFolderDataModule to create the LightningDataModule for us. This class requires the following parameters:

data_path: the root directory of the data;
features: the name of the features columns;
pad: a boolean indicating if the samples should be padded to the same length, that is, the length of the longest sample in the dataset. The padding scheme will replicate the samples, from the beginning, until the length of the longest sample is reached.

NOTE: The samples may have different lengths, so, for this method, the batch_size must be 1.

[1]:

import numpy as np
import torch
from ssl_tools.data.data_modules import UserActivityFolderDataModule

data_path = "/workspaces/hiaac-m4/data/view_concatenated/KuHar_cpc"

data_module = UserActivityFolderDataModule(
    data_path,
    features=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
    batch_size=1,       # We set to 1 for CPC
    label=None,         # We do not want to return the labels, only data.
    pad=False           # If you want padded data, set it to True.
                        #   This guarantees that all data have the same length.
)

data_module

[1706884475.353781] [aae107fc745c:2265333:f]        vfs_fuse.c:281  UCX  ERROR inotify_add_watch(/tmp) failed: No space left on device

[1]:

UserActivityFolderDataModule(data_path=/workspaces/hiaac-m4/data/view_concatenated/KuHar_cpc, batch_size=1)

Pre-training the model

Here we will use the builder method build_cpc to instantiate the CPC model. This will instantiate an CPC self-supervised model, with the default encoder (ssl_tools.models.layers.gru.GRUEncoder), that is an GRU+Linear, and the default autoregressive model (torch.nn.GRU).

We can parametrize the creation of the model by passing the desired parameters to the builder method. T he build_cpc method can be parametrized the following parameters:

encoding_size: the size of the encoded representation;
in_channels: number of input features;
gru_hidden_size: number of features in the hidden state of the GRU;
gru_num_layers: number of layers in the GRU;
learning_rate: the learning rate of the optimizer;
window_size : size of the input windows (X_t) to be fed to the encoder (GRU).

All parameters are optional, and have default values. You may want to consult the documentation of the method to see the default values and additional parameters.

Note that the LightningModule returned by the build_cpc method is already configured to use the CPC loss, and the Adam optimizer.

[2]:

from ssl_tools.models.ssl.cpc import build_cpc
encoding_size = 128
in_channels = 6
gru_hidden_size = 100
gru_num_layers = 1
learning_rate = 1e-3

model = build_cpc(
    encoding_size=encoding_size,
    in_channels=in_channels,
    gru_hidden_size=gru_hidden_size,
    gru_num_layers=gru_num_layers,
    learning_rate=learning_rate
)
model

[2]:

CPC(
  (encoder): GRUEncoder(
    (rnn): GRU(6, 100, bidirectional=True)
    (nn): Linear(in_features=200, out_features=128, bias=True)
  )
  (density_estimator): Linear(in_features=128, out_features=128, bias=True)
  (auto_regressor): GRU(128, 128, batch_first=True)
  (loss_func): CrossEntropyLoss()
)

We instantiate the Trainer and call the fit method to train the model.

[3]:

import lightning as L

max_epochs = 10
trainer = L.Trainer(max_epochs=max_epochs)
trainer.fit(model, data_module)

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name              | Type             | Params
-------------------------------------------------------
0 | encoder           | GRUEncoder       | 90.5 K
1 | density_estimator | Linear           | 16.5 K
2 | auto_regressor    | GRU              | 99.1 K
3 | loss_func         | CrossEntropyLoss | 0
-------------------------------------------------------
206 K     Trainable params
0         Non-trainable params
206 K     Total params
0.824     Total estimated model params size (MB)

`Trainer.fit` stopped: `max_epochs=10` reached.

This finishes the pre-training stage.

To obtain the latent representations of the data, we must use model.forward() method on the data. In this framework, the forward method of the SSL models returns the latent representations of the input data. Usually this is the output of the encoder, as in this case, but it may vary depending on the model.

We will use the encoder to obtain the latent representations of the data, and then we will use these representations to train a classifier for the downstream task.

Fine-tuning the model

After pre-training the model, we will use the learned representations to train a classifier for the downstream task, in this case, the HAR task.

NOTE: It is important that the SSL models implement the forward method to return the latent representations of the input data, so we can use these representations to train the classifier.

Creating the LightningDataModule

Human acivity recognition is a supervised classification task, that usually receives multi-modal windowed time-series as input, diferently from the self-supervised task, that receives the whole time-series of a single user. Thus, we cannot use the same LightningDataModule to load the data for the downstream task.

In this notebook, we will use the windowed time-series version of the KuHar dataset, that each split is a single CSV file, containing windowed time-series of the users. The content of the file should be something like:

KuHar/
    train.csv
    validation.csv
    test.csv

The CSVs file may look like this:

accel-x-0	accel-x-1	accel-y-0	accel-y-1	class
0.502123	0.02123	0.502123	0.502123	0
0.6820123	0.02123	0.502123	0.502123	1
0.498217	0.00001	1.414141	3.141592	1

As each CSV file contains time-windows signals of two 3-axis sensors (accelerometer and gyroscope), we must use the MultiModalSeriesCSVDataset class.

As in last notebook, we will use the MultiModalHARSeriesDataModule to facilitate the creation of the LightningDataModule. This class will create DataLoader of MultiModalSeriesCSVDataset for each split (train, validation, and test), and will setup data correctly.

[4]:

from ssl_tools.data.data_modules.har import MultiModalHARSeriesDataModule

data_path = "/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar/"

data_module = MultiModalHARSeriesDataModule(
    data_path=data_path,
    feature_prefixes=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
    label="standard activity code",
    features_as_channels=True,
    batch_size=64,
    num_workers=0,  # Sequential, for notebook compatibility
)
data_module

[4]:

MultiModalHARSeriesDataModule(data_path=/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar, batch_size=64)

Fine-tuning the model

A model for a downstream task is usually composed of two parts: the backbone model, that is the model that generates the representations of the input data, i.e., the encoder, and the prediction head, which is the model that receives the representations and outputs the predictions, usually, a MLP.

To handle the fine-tune process, we can design a new model, that is composed of the pre-trained backbone and the prediction head, and then train this new model with the labeled data. In order to facilitate this process, this framework provides the SSLDiscriminator class, that receives the backbone model and the prediction head, and then trains the classifier with the labeled data.

In summary, the SSLDiscriminator class is a LightningModule that generate the representations of the input data using the backbone model, that is, using the forward method of the pre-trained backbone model, and then uses the prediction head to output the predictions, something like y_hat = prediction_head(backbone(sample)). The predictions and labels are then used to compute the loss and train the model. By default, the SSLDiscriminator is trained using the Adam optimizer with parametrizable learning_rate.

It worth to mention that the SSLDiscriminator class forward method receives the input data and the labels, and returns the predictions. This is different from the forward method of the self-supervised models, that receives only the input data and returns the latent representations of the input data.

It worth to notice that the fine-tune train process can be done in two ways:

Fine-tuning the whole model, that is, backbone (encoder) and classifier, with the labeled data; or
Fine-tuning only the classifier, with the labeled data.

The SSLDisriminator class can handle both cases, with the update_backbone parameter. If update_backbone is True, the whole model is fine-tuned (case 1, above), otherwise, only the classifier is fine-tuned (case 2, above).

Let’s create our prediction head and SSLDisriminator model and train it with the labeled data. Prediction heads for most popular tasks are already implemented in the ssl_tools.models.ssl.modules.heads module. In this notebook, we will use the CPCPredictionHead prediction head, that is a MLP with 3 hidden layers and dropout.

[5]:

from ssl_tools.models.ssl.classifier import SSLDiscriminator
from ssl_tools.models.ssl.modules.heads import CPCPredictionHead

number_of_classes = 6

prediction_head = CPCPredictionHead(
    input_dim=encoding_size,                # Size of the encoding (input)
    hidden_dim1=64,
    hidden_dim2=64,
    output_dim=number_of_classes            # Number of classes
)

prediction_head

[5]:

CPCPredictionHead(
  (layers): Sequential(
    (0): Linear(in_features=128, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=64, bias=True)
    (3): Sequential(
      (0): ReLU()
      (1): Dropout(p=0, inplace=False)
    )
    (4): Linear(in_features=64, out_features=6, bias=True)
    (5): Softmax(dim=1)
  )
)

Now we create the SSLDisriminator model. This class requires the following parameters:

backbone: the backbone model, that is, the pre-trained model;
head: the prediction head model;
loss_fn: the loss function to be used to train the model;

Also, we can attach metrics that will be calculated with for every batch of validation and test sets. The metrics is passed using the metrics parameter of the SSLDisriminator class, that receives a dictionary with the name of the metric as key and the torchmetrics.Metric as value.

Let’s create the SSLDiscriminator and attach the Accuracy metric to the model, to check the validation accuracy per epoch.

[6]:

from torchmetrics import Accuracy
from torch.nn import CrossEntropyLoss

acc_metric = Accuracy(
    task="multiclass",              # We are working with a multiclass
                                    #   classification, not a binary one.
    num_classes=number_of_classes   # Number of classes
)

ssl_discriminator = SSLDiscriminator(
    backbone=model,                 # The model we trained before (CPC)
    head=prediction_head,           # The prediction head we just created
    loss_fn=CrossEntropyLoss(),     # The loss function
    learning_rate=1e-3,
    update_backbone=False,          # We do not want to update the backbone
    metrics={"acc": acc_metric}     # We want to track the accuracy
)
ssl_discriminator

[6]:

SSLDiscriminator(
  (backbone): CPC(
    (encoder): GRUEncoder(
      (rnn): GRU(6, 100, bidirectional=True)
      (nn): Linear(in_features=200, out_features=128, bias=True)
    )
    (density_estimator): Linear(in_features=128, out_features=128, bias=True)
    (auto_regressor): GRU(128, 128, batch_first=True)
    (loss_func): CrossEntropyLoss()
  )
  (head): CPCPredictionHead(
    (layers): Sequential(
      (0): Linear(in_features=128, out_features=64, bias=True)
      (1): ReLU()
      (2): Linear(in_features=64, out_features=64, bias=True)
      (3): Sequential(
        (0): ReLU()
        (1): Dropout(p=0, inplace=False)
      )
      (4): Linear(in_features=64, out_features=6, bias=True)
      (5): Softmax(dim=1)
    )
  )
  (loss_fn): CrossEntropyLoss()
)

Then we can instantiate the Trainer and call the fit method to train the model.

[7]:

import lightning as L

max_epochs = 10
trainer = L.Trainer(max_epochs=max_epochs)
trainer.fit(ssl_discriminator, data_module)

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name     | Type              | Params
-----------------------------------------------
0 | backbone | CPC               | 206 K
1 | head     | CPCPredictionHead | 12.8 K
2 | loss_fn  | CrossEntropyLoss  | 0
-----------------------------------------------
12.8 K    Trainable params
206 K     Non-trainable params
218 K     Total params
0.876     Total estimated model params size (MB)

/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py:293: The number of training batches (22) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.

`Trainer.fit` stopped: `max_epochs=10` reached.

Let’s evaluate the model using the test set. If we have added the Accuracy metric to the model, it will calculate the accuracy of the model on the test set. All logged metrics will be returnet by .test() method, as a list of dictionaries.

[8]:

results = trainer.test(ssl_discriminator, data_module)
results

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test_acc          │    0.5277777910232544     │
│         test_loss         │    1.4903016090393066     │
└───────────────────────────┴───────────────────────────┘

[8]:

[{'test_loss': 1.4903016090393066, 'test_acc': 0.5277777910232544}]

Finally, if we want to get the predictions of the model, we can:

Call the forward method of the model, passing the input data (iterating over all batches of the dataloader); or
Use the Trainer.predict method, passing the data module. If you use the Trainer.predict method, the model will be set to evaluation mode, and the predictions will be done using the predict_dataloader defined in the LightningDataModule. This is usually the test set (test_dataloader).

[9]:

y_hat = trainer.predict(ssl_discriminator, data_module)
# predictions is a list of tensors. Let's concatenate them.
y_hat = torch.cat(y_hat)
y_hat.shape

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.

[9]:

torch.Size([144, 6])

Next steps

This notebook comprises the whole process of training a self-supervised model and then using the learned representations to train a classifier for the downstream task.

We can standardize this process to facilitate the reproduction of the experiments, and then use it to train different models and evaluate them on different datasets.

Nextly we will explore the Experiment API that is designed to simplify the process of training and evaluating models, besides of provide a standard way to log the results, save and load models, and more.