3. Training a self-supervised model: Contrastive Predictive Coding (CPC)
In this notebook, we will train a self-supervised model using the Contrastive Predictive Coding (CPC) method. This method is based on the idea of predicting future tokens in a sequence, and it has been shown to be very effective in learning useful representations for downstream tasks. This framework already provides an implementation of CPC, so we will use it to train the model.
We will pre-train the model using KuHar dataset, and then we will use the learned representations to train a classifier for the downstream task (fine tuning). For both stages of training, as the last notebook, we will:
Create a
Dataset
and thenLightningDataModule
to load the data;Instantiate the CPC model; and
Train the model using PyTorch Lightning.
Every SSL model in this framework can instantiate in two ways:
Instantiate each element, such as the encoder, the autoregressive model, and the CPC model, and then pass them to the CPC model; or
Using builder methods to instantiate the model. In this case, we do not need to instantiate each element separately, but we can still customize the model by passing the desired parameters to the builder methods. This is the approach we will use in this notebook.
In summary, the second approach encapsulates the first one, making it easier to use and it is more convenient for our purposes.
Pre-training the model
We will pre-train the model using the KuHar dataset. CPC is a self-supervised method, so we do not need labels to train the model. However, CPC assumes that the input data is sequential, that is, an input is a sequence of time-steps comprising different acitivities. Thus, for HAR, usually, one sample is a multi-modal time-series correspond to the whole time-series of a single user.
Creating the LightningDataModule
Our dataset must be organized in the following way:
data/
train/
user1.csv
user2.csv
...
validation/
user4.csv
user5.csv
...
test/
user6.csv
user7.csv
...
And the content of each CSV file should be something like:
timestamp |
accel-x |
accel-y |
accel-z |
gyro-x |
gyro-y |
gyro-z |
activity |
---|---|---|---|---|---|---|---|
0 |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
0.6 |
0 |
1 |
0.2 |
0.3 |
0.4 |
0.5 |
0.6 |
0.7 |
0 |
… |
… |
… |
… |
… |
… |
… |
… |
Where timestamp
is the time-stamp of the sample, accel-x
, accel-y
, accel-z
, gyro-x
, gyro-y
, and gyro-z
are the features of the sample, and activity
is the label of the time-step.
In this way, we should use the SeriesFolderCSVDataset
to load the data. This will create a Dataset
for us, where each CSV file is a sample, and each row of the CSV file is a time-step, and the columns are the features.
If your data is organized as above, where inside the root folder (data/
in this case) there are sub-folders for each split (train/
, validation/
, and test/
), and inside each split folder there are the CSV files, you can use the UserActivityFolderDataModule
to create a LightningDataModule
for you. This class will create DataLoader
of SeriesFolderCSVDataset
for each split (train, validation, and test), and will setup data correctly.
In this notebook, we will use the UserActivityFolderDataModule
to create the LightningDataModule
for us. This class requires the following parameters:
data_path
: the root directory of the data;features
: the name of the features columns;pad
: a boolean indicating if the samples should be padded to the same length, that is, the length of the longest sample in the dataset. The padding scheme will replicate the samples, from the beginning, until the length of the longest sample is reached.
NOTE: The samples may have different lengths, so, for this method, the
batch_size
must be 1.
[1]:
import numpy as np
import torch
from ssl_tools.data.data_modules import UserActivityFolderDataModule
data_path = "/workspaces/hiaac-m4/data/view_concatenated/KuHar_cpc"
data_module = UserActivityFolderDataModule(
data_path,
features=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
batch_size=1, # We set to 1 for CPC
label=None, # We do not want to return the labels, only data.
pad=False # If you want padded data, set it to True.
# This guarantees that all data have the same length.
)
data_module
[1706884475.353781] [aae107fc745c:2265333:f] vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device
[1]:
UserActivityFolderDataModule(data_path=/workspaces/hiaac-m4/data/view_concatenated/KuHar_cpc, batch_size=1)
Pre-training the model
Here we will use the builder method build_cpc
to instantiate the CPC model. This will instantiate an CPC self-supervised model, with the default encoder (ssl_tools.models.layers.gru.GRUEncoder
), that is an GRU+Linear, and the default autoregressive model (torch.nn.GRU
).
We can parametrize the creation of the model by passing the desired parameters to the builder method. T he build_cpc
method can be parametrized the following parameters:
encoding_size
: the size of the encoded representation;in_channels
: number of input features;gru_hidden_size
: number of features in the hidden state of the GRU;gru_num_layers
: number of layers in the GRU;learning_rate
: the learning rate of the optimizer;window_size
: size of the input windows (X_t
) to be fed to the encoder (GRU).
All parameters are optional, and have default values. You may want to consult the documentation of the method to see the default values and additional parameters.
Note that the LightningModule
returned by the build_cpc
method is already configured to use the CPC loss, and the Adam
optimizer.
[2]:
from ssl_tools.models.ssl.cpc import build_cpc
encoding_size = 128
in_channels = 6
gru_hidden_size = 100
gru_num_layers = 1
learning_rate = 1e-3
model = build_cpc(
encoding_size=encoding_size,
in_channels=in_channels,
gru_hidden_size=gru_hidden_size,
gru_num_layers=gru_num_layers,
learning_rate=learning_rate
)
model
[2]:
CPC(
(encoder): GRUEncoder(
(rnn): GRU(6, 100, bidirectional=True)
(nn): Linear(in_features=200, out_features=128, bias=True)
)
(density_estimator): Linear(in_features=128, out_features=128, bias=True)
(auto_regressor): GRU(128, 128, batch_first=True)
(loss_func): CrossEntropyLoss()
)
We instantiate the Trainer and call the fit
method to train the model.
[3]:
import lightning as L
max_epochs = 10
trainer = L.Trainer(max_epochs=max_epochs)
trainer.fit(model, data_module)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
| Name | Type | Params
-------------------------------------------------------
0 | encoder | GRUEncoder | 90.5 K
1 | density_estimator | Linear | 16.5 K
2 | auto_regressor | GRU | 99.1 K
3 | loss_func | CrossEntropyLoss | 0
-------------------------------------------------------
206 K Trainable params
0 Non-trainable params
206 K Total params
0.824 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=10` reached.
This finishes the pre-training stage.
To obtain the latent representations of the data, we must use model.forward()
method on the data. In this framework, the forward
method of the SSL models returns the latent representations of the input data. Usually this is the output of the encoder, as in this case, but it may vary depending on the model.
We will use the encoder to obtain the latent representations of the data, and then we will use these representations to train a classifier for the downstream task.
Fine-tuning the model
After pre-training the model, we will use the learned representations to train a classifier for the downstream task, in this case, the HAR task.
NOTE: It is important that the SSL models implement the
forward
method to return the latent representations of the input data, so we can use these representations to train the classifier.
Creating the LightningDataModule
Human acivity recognition is a supervised classification task, that usually receives multi-modal windowed time-series as input, diferently from the self-supervised task, that receives the whole time-series of a single user. Thus, we cannot use the same LightningDataModule
to load the data for the downstream task.
In this notebook, we will use the windowed time-series version of the KuHar dataset, that each split is a single CSV file, containing windowed time-series of the users. The content of the file should be something like:
KuHar/
train.csv
validation.csv
test.csv
The CSVs file may look like this:
accel-x-0 |
accel-x-1 |
accel-y-0 |
accel-y-1 |
class |
---|---|---|---|---|
0.502123 |
0.02123 |
0.502123 |
0.502123 |
0 |
0.6820123 |
0.02123 |
0.502123 |
0.502123 |
1 |
0.498217 |
0.00001 |
1.414141 |
3.141592 |
1 |
As each CSV file contains time-windows signals of two 3-axis sensors (accelerometer and gyroscope), we must use the MultiModalSeriesCSVDataset
class.
As in last notebook, we will use the MultiModalHARSeriesDataModule
to facilitate the creation of the LightningDataModule
. This class will create DataLoader
of MultiModalSeriesCSVDataset
for each split (train, validation, and test), and will setup data correctly.
[4]:
from ssl_tools.data.data_modules.har import MultiModalHARSeriesDataModule
data_path = "/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar/"
data_module = MultiModalHARSeriesDataModule(
data_path=data_path,
feature_prefixes=("accel-x", "accel-y", "accel-z", "gyro-x", "gyro-y", "gyro-z"),
label="standard activity code",
features_as_channels=True,
batch_size=64,
num_workers=0, # Sequential, for notebook compatibility
)
data_module
[4]:
MultiModalHARSeriesDataModule(data_path=/workspaces/hiaac-m4/ssl_tools/data/standartized_balanced/KuHar, batch_size=64)
Fine-tuning the model
A model for a downstream task is usually composed of two parts: the backbone model, that is the model that generates the representations of the input data, i.e., the encoder, and the prediction head, which is the model that receives the representations and outputs the predictions, usually, a MLP.
To handle the fine-tune process, we can design a new model, that is composed of the pre-trained backbone and the prediction head, and then train this new model with the labeled data. In order to facilitate this process, this framework provides the SSLDiscriminator
class, that receives the backbone model and the prediction head, and then trains the classifier with the labeled data.
In summary, the SSLDiscriminator
class is a LightningModule
that generate the representations of the input data using the backbone model, that is, using the forward
method of the pre-trained backbone model, and then uses the prediction head to output the predictions, something like y_hat = prediction_head(backbone(sample))
. The predictions and labels are then used to compute the loss and train the model. By default, the SSLDiscriminator
is trained using the Adam
optimizer
with parametrizable learning_rate
.
It worth to mention that the SSLDiscriminator
class forward
method receives the input data and the labels, and returns the predictions. This is different from the forward
method of the self-supervised models, that receives only the input data and returns the latent representations of the input data.
It worth to notice that the fine-tune train process can be done in two ways:
Fine-tuning the whole model, that is, backbone (encoder) and classifier, with the labeled data; or
Fine-tuning only the classifier, with the labeled data.
The SSLDisriminator
class can handle both cases, with the update_backbone
parameter. If update_backbone
is True
, the whole model is fine-tuned (case 1, above), otherwise, only the classifier is fine-tuned (case 2, above).
Let’s create our prediction head and SSLDisriminator
model and train it with the labeled data. Prediction heads for most popular tasks are already implemented in the ssl_tools.models.ssl.modules.heads
module. In this notebook, we will use the CPCPredictionHead
prediction head, that is a MLP with 3 hidden layers and dropout.
[5]:
from ssl_tools.models.ssl.classifier import SSLDiscriminator
from ssl_tools.models.ssl.modules.heads import CPCPredictionHead
number_of_classes = 6
prediction_head = CPCPredictionHead(
input_dim=encoding_size, # Size of the encoding (input)
hidden_dim1=64,
hidden_dim2=64,
output_dim=number_of_classes # Number of classes
)
prediction_head
[5]:
CPCPredictionHead(
(layers): Sequential(
(0): Linear(in_features=128, out_features=64, bias=True)
(1): ReLU()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): Sequential(
(0): ReLU()
(1): Dropout(p=0, inplace=False)
)
(4): Linear(in_features=64, out_features=6, bias=True)
(5): Softmax(dim=1)
)
)
Now we create the SSLDisriminator
model. This class requires the following parameters:
backbone
: the backbone model, that is, the pre-trained model;head
: the prediction head model;loss_fn
: the loss function to be used to train the model;
Also, we can attach metrics that will be calculated with for every batch of validation
and test
sets. The metrics is passed using the metrics
parameter of the SSLDisriminator
class, that receives a dictionary with the name of the metric as key and the torchmetrics.Metric
as value.
Let’s create the SSLDiscriminator
and attach the Accuracy
metric to the model, to check the validation accuracy per epoch.
[6]:
from torchmetrics import Accuracy
from torch.nn import CrossEntropyLoss
acc_metric = Accuracy(
task="multiclass", # We are working with a multiclass
# classification, not a binary one.
num_classes=number_of_classes # Number of classes
)
ssl_discriminator = SSLDiscriminator(
backbone=model, # The model we trained before (CPC)
head=prediction_head, # The prediction head we just created
loss_fn=CrossEntropyLoss(), # The loss function
learning_rate=1e-3,
update_backbone=False, # We do not want to update the backbone
metrics={"acc": acc_metric} # We want to track the accuracy
)
ssl_discriminator
[6]:
SSLDiscriminator(
(backbone): CPC(
(encoder): GRUEncoder(
(rnn): GRU(6, 100, bidirectional=True)
(nn): Linear(in_features=200, out_features=128, bias=True)
)
(density_estimator): Linear(in_features=128, out_features=128, bias=True)
(auto_regressor): GRU(128, 128, batch_first=True)
(loss_func): CrossEntropyLoss()
)
(head): CPCPredictionHead(
(layers): Sequential(
(0): Linear(in_features=128, out_features=64, bias=True)
(1): ReLU()
(2): Linear(in_features=64, out_features=64, bias=True)
(3): Sequential(
(0): ReLU()
(1): Dropout(p=0, inplace=False)
)
(4): Linear(in_features=64, out_features=6, bias=True)
(5): Softmax(dim=1)
)
)
(loss_fn): CrossEntropyLoss()
)
Then we can instantiate the Trainer and call the fit
method to train the model.
[7]:
import lightning as L
max_epochs = 10
trainer = L.Trainer(max_epochs=max_epochs)
trainer.fit(ssl_discriminator, data_module)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
| Name | Type | Params
-----------------------------------------------
0 | backbone | CPC | 206 K
1 | head | CPCPredictionHead | 12.8 K
2 | loss_fn | CrossEntropyLoss | 0
-----------------------------------------------
12.8 K Trainable params
206 K Non-trainable params
218 K Total params
0.876 Total estimated model params size (MB)
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py:293: The number of training batches (22) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
`Trainer.fit` stopped: `max_epochs=10` reached.
Let’s evaluate the model using the test set. If we have added the Accuracy
metric to the model, it will calculate the accuracy of the model on the test set. All logged metrics will be returnet by .test()
method, as a list of dictionaries.
[8]:
results = trainer.test(ssl_discriminator, data_module)
results
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_acc │ 0.5277777910232544 │ │ test_loss │ 1.4903016090393066 │ └───────────────────────────┴───────────────────────────┘
[8]:
[{'test_loss': 1.4903016090393066, 'test_acc': 0.5277777910232544}]
Finally, if we want to get the predictions of the model, we can:
Call the
forward
method of the model, passing the input data (iterating over all batches of the dataloader); orUse the
Trainer.predict
method, passing the data module. If you use theTrainer.predict
method, the model will be set to evaluation mode, and the predictions will be done using thepredict_dataloader
defined in theLightningDataModule
. This is usually the test set (test_dataloader
).
[9]:
y_hat = trainer.predict(ssl_discriminator, data_module)
# predictions is a list of tensors. Let's concatenate them.
y_hat = torch.cat(y_hat)
y_hat.shape
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=47` in the `DataLoader` to improve performance.
[9]:
torch.Size([144, 6])
Next steps
This notebook comprises the whole process of training a self-supervised model and then using the learned representations to train a classifier for the downstream task.
We can standardize this process to facilitate the reproduction of the experiments, and then use it to train different models and evaluate them on different datasets.
Nextly we will explore the Experiment
API that is designed to simplify the process of training and evaluating models, besides of provide a standard way to log the results, save and load models, and more.