Multiple discrete targets with the TUH EEG Corpus#

Welcome to this tutorial where we demonstrate how to work with multiple discrete: targets for each recording in the TUH EEG Corpus. We’ll guide you through the process step by step.

# Author: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)

import mne
from torch.utils.data import DataLoader

from braindecode.datasets import TUH
from braindecode.preprocessing import create_fixed_length_windows

# Setting Logging Level
# ----------------------
#
# We'll set the logging level to 'ERROR' to avoid excessive messages when
# extracting windows:

mne.set_log_level('ERROR')  # avoid messages every time a window is extracted

If you want to try this code with the actual data, please delete the next section. We are required to mock some dataset functionality, since the data is not available at creation time of this example.

from braindecode.datasets.tuh import _TUHMock as TUH  # noqa F811

Creating Temple University Hospital (TUH) EEG Corpus Dataset#

We start by creating a TUH dataset. Instead of just a `str, we give it multiple strings as target names. Each of the strings has to exist as a column in the description DataFrame.

TUH_PATH = 'please insert actual path to data here'
tuh = TUH(
    path=TUH_PATH,
    recording_ids=None,
    target_name=('age', 'gender'),  # use both age and gender as decoding target
    preload=False,
    add_physician_reports=False,
)
print(tuh.description)

                                                path version  ...  age  gender
tuh_eeg/v1.1.0/edf/02_tcp_le/000/00000058/s001...  v1.1.0  ...    0       M
tuh_eeg/v1.1.0/edf/01_tcp_ar/099/00009932/s004...  v1.1.0  ...   53       F
tuh_eeg/v1.1.0/edf/03_tcp_ar_a/123/00012331/s0...  v1.1.0  ...   39       M
tuh_eeg/v1.1.0/edf/01_tcp_ar/000/00000000/s001...  v1.1.0  ...   37       M
tuh_eeg/v1.2.0/edf/03_tcp_ar_a/149/00014928/s0...  v1.2.0  ...   83       F

[5 rows x 10 columns]

Exploring Data#

Iterating through the dataset gives x as an ndarray with shape (n_channels x 1) and y as a list containing [age of the subject, gender of the subject]. Let’s look at the last example as it has more interesting age/gender labels (compare to the last row of the dataframe above).

x, y = tuh[-1]

print(f'{x=}\n{y=}')

x=array([[-0.92922518],
       [ 1.22477157],
       [ 0.12493701],
       [ 1.28417889],
       [ 0.13578887],
       [-1.56835277],
       [ 1.25153176],
       [-1.7638324 ],
       [ 0.14345092],
       [-0.30167542],
       [ 0.68056549],
       [ 1.70380673],
       [-0.70052629],
       [-1.21543634],
       [ 0.96330007],
       [ 0.02736802],
       [ 1.65902848],
       [-0.15756383],
       [ 0.38658058],
       [-0.04194231],
       [-0.15447331]])
y=[83, 'F']

Creating Windows#

We will skip preprocessing steps for now, since it is not the aim of this example. Instead, we will directly create compute windows. We specify a mapping from genders ‘M’ and ‘F’ to integers, since this is required for decoding.

tuh_windows = create_fixed_length_windows(
    tuh,
    start_offset_samples=0,
    stop_offset_samples=None,
    window_size_samples=1000,
    window_stride_samples=1000,
    drop_last_window=False,
    mapping={'M': 0, 'F': 1},  # map non-digit targets
)
# store the number of windows required for loading later on
tuh_windows.set_description({
    "n_windows": [len(d) for d in tuh_windows.datasets]})

Exploring Windows#

Iterating through the dataset gives x as an ndarray with shape (n_channels x 1000), y as [age, gender], and ind. Let’s look at the last example again.

x, y, ind = tuh_windows[-1]
print(f'{x=}\n{y=}\n{ind=}')

x=array([[ 0.16017832,  0.11754766,  0.6515615 , ..., -0.50407195,
         0.7429778 , -0.9292252 ],
       [ 0.86039335,  0.5169721 , -1.0308012 , ..., -1.1209397 ,
         1.5210943 ,  1.2247716 ],
       [-2.373041  , -0.14827406,  0.25829056, ...,  1.1486133 ,
         0.15352039,  0.12493701],
       ...,
       [ 0.30362263, -0.939559  ,  2.5686462 , ..., -0.5503989 ,
        -1.3299779 ,  0.3865806 ],
       [ 1.7443461 ,  1.1792846 , -0.25878426, ..., -0.5670986 ,
        -1.2997373 , -0.04194231],
       [ 1.0537782 ,  1.4429058 , -0.05458383, ...,  1.000525  ,
         0.66521484, -0.15447332]], dtype=float32)
y=[83, 1]
ind=[3, 2600, 3600]

DataLoader for Model Training#

We give the dataset to a pytorch DataLoader, such that it can be used for model training.

dl = DataLoader(
    dataset=tuh_windows,
    batch_size=4,
)

Exploring DataLoader#

When iterating through the DataLoader, we get batch_X as a tensor with shape (4 x n_channels x 1000), batch_y as [tensor([4 x age of subject]), tensor([4 x gender of subject])], and batch_ind. To view the last example, simply iterate through the DataLoader:

for batch_X, batch_y, batch_ind in dl:
    pass

print(f'{batch_X=}\n{batch_y=}\n{batch_ind=}')

batch_X=tensor([[[ 0.8024, -0.4564, -1.4433,  ..., -0.7404, -2.2522, -3.0508],
         [ 0.4418,  0.5201, -1.2037,  ...,  1.0273,  1.0506,  0.1677],
         [-2.1145,  0.7075,  0.1768,  ..., -0.0260, -0.7216,  0.2742],
         ...,
         [ 0.3986,  0.3912,  0.8000,  ...,  1.0593, -0.8097, -1.1729],
         [-0.6267, -0.8419, -1.3996,  ...,  0.6685,  2.9266, -0.1319],
         [ 1.1409,  1.6246, -0.6229,  ...,  0.5953, -0.3169, -1.4604]],

        [[-0.2885, -0.0168,  1.6279,  ...,  0.6533,  2.7673, -0.6671],
         [-0.0448, -1.1422, -0.5395,  ..., -0.4664, -0.1882,  0.2031],
         [ 0.6282, -0.4535, -0.8585,  ..., -0.2657, -0.9942, -0.3878],
         ...,
         [-0.1360,  0.9132,  0.9481,  ..., -0.5033, -1.3406,  0.1251],
         [ 0.6089, -0.9883, -0.2919,  ..., -1.3796,  0.8036,  0.7666],
         [ 0.8174, -0.3829, -1.1508,  ..., -0.8516,  2.0426,  1.1322]],

        [[-0.4122,  1.2526, -0.9711,  ..., -0.6890, -1.4660,  1.2500],
         [-0.5510,  0.3613,  1.0280,  ..., -1.1350, -0.2249, -0.9406],
         [ 0.7754,  0.7664, -1.1807,  ...,  1.8206,  1.1281,  0.1607],
         ...,
         [-0.9353,  0.0165, -0.6509,  ..., -0.7276,  0.2137, -0.5619],
         [-0.0408, -1.9825, -0.3648,  ..., -1.6978,  0.8943,  1.2374],
         [-1.1185, -0.4398, -0.8218,  ..., -1.5172,  2.2975,  1.0311]],

        [[ 0.1602,  0.1175,  0.6516,  ..., -0.5041,  0.7430, -0.9292],
         [ 0.8604,  0.5170, -1.0308,  ..., -1.1209,  1.5211,  1.2248],
         [-2.3730, -0.1483,  0.2583,  ...,  1.1486,  0.1535,  0.1249],
         ...,
         [ 0.3036, -0.9396,  2.5686,  ..., -0.5504, -1.3300,  0.3866],
         [ 1.7443,  1.1793, -0.2588,  ..., -0.5671, -1.2997, -0.0419],
         [ 1.0538,  1.4429, -0.0546,  ...,  1.0005,  0.6652, -0.1545]]])
batch_y=[tensor([83, 83, 83, 83]), tensor([1, 1, 1, 1])]
batch_ind=[tensor([0, 1, 2, 3]), tensor([   0, 1000, 2000, 2600]), tensor([1000, 2000, 3000, 3600])]

Total running time of the script: (0 minutes 2.877 seconds)

Estimated memory usage: 10 MB

Gallery generated by Sphinx-Gallery