Note
Go to the end to download the full example code.
Split Dataset Example#
In this example, we aim to show multiple ways of how you can split your datasets for training, testing, and evaluating your models.
# Authors: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)
from braindecode.datasets import MOABBDataset
from braindecode.preprocessing import create_windows_from_events
Loading the dataset#
Firstly, we create a dataset using the braindecode
MOABBDataset to load
it fetched from MOABB. In this example, we’re using Dataset 2a from BCI
Competition IV.
dataset = MOABBDataset(dataset_name="BNCI2014_001", subject_ids=[1])
Splitting#
By description information#
The class MOABBDataset has a pandas
DataFrame containing additional description of its internal datasets,
which can be used to help splitting the data
based on recording information, such as subject, session, and run of each trial.
dataset.description
Here, we’re splitting the data based on different runs. The method split returns a dictionary with string keys corresponding to unique entries in the description DataFrame column.
splits = dataset.split("run")
print(splits)
splits["4"].description
{'0': <BaseConcatDataset | 2 RawDataset(s) | 193470 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run], '1': <BaseConcatDataset | 2 RawDataset(s) | 193470 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run], '2': <BaseConcatDataset | 2 RawDataset(s) | 193470 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run], '3': <BaseConcatDataset | 2 RawDataset(s) | 193470 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run], '4': <BaseConcatDataset | 2 RawDataset(s) | 193470 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run], '5': <BaseConcatDataset | 2 RawDataset(s) | 193470 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]}
By row index#
Another way we can split the dataset is based on a list of integers corresponding to rows in the description. In this case, the returned dictionary will have ‘0’ as the only key.
splits = dataset.split([0, 1, 5])
print(splits)
splits["0"].description
{'0': <BaseConcatDataset | 3 RawDataset(s) | 290205 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run]}
However, if we want multiple splits based on indices, we can also define a list containing lists of integers. In this case, the dictionary will have string keys representing the index of the dataset split in the order of the given list of integers.
splits = dataset.split([[0, 1, 5], [2, 3, 4], [6, 7, 8, 9, 10, 11]])
print(splits)
splits["2"].description
{'0': <BaseConcatDataset | 3 RawDataset(s) | 290205 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run], '1': <BaseConcatDataset | 3 RawDataset(s) | 290205 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run], '2': <BaseConcatDataset | 6 RawDataset(s) | 580410 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 6 recordings × 3 columns [subject, session, run]}
You can also name each split in the output dictionary by specifying the keys of each list of indexes in the input dictionary:
splits = dataset.split(
{"train": [0, 1, 5], "valid": [2, 3, 4], "test": [6, 7, 8, 9, 10, 11]}
)
print(splits)
splits["test"].description
{'train': <BaseConcatDataset | 3 RawDataset(s) | 290205 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run], 'valid': <BaseConcatDataset | 3 RawDataset(s) | 290205 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run], 'test': <BaseConcatDataset | 6 RawDataset(s) | 580410 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
Duration*: 386.9 s
(* from first recording)
Description: 6 recordings × 3 columns [subject, session, run]}
Observation#
Similarly, we can split datasets after creating windows using the same methods.
windows = create_windows_from_events(
dataset, trial_start_offset_samples=0, trial_stop_offset_samples=0
)
# Splitting by different runs
print("Using description info")
splits = windows.split("run")
print(splits)
print()
# Splitting by row index
print("Splitting by row index")
splits = windows.split([4, 8])
print(splits)
print()
print("Multiple row index split")
splits = windows.split([[4, 8], [5, 9, 11]])
print(splits)
print()
# Specifying output's keys
print("Specifying keys")
splits = windows.split(dict(train=[4, 8], test=[5, 9, 11]))
print(splits)
Using description info
{'0': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), '1': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), '2': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), '3': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), '4': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), '5': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24})}
Splitting by row index
{'0': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24})}
Multiple row index split
{'0': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), '1': <BaseConcatDataset | 3 EEGWindowsDataset(s) | 144 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 36, 1: 36, 2: 36, 3: 36})}
Specifying keys
{'train': <BaseConcatDataset | 2 EEGWindowsDataset(s) | 96 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 2 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 24, 1: 24, 2: 24, 3: 24}), 'test': <BaseConcatDataset | 3 EEGWindowsDataset(s) | 144 total samples>
Sfreq*: 250.0 Hz
Channels*: 26 (22 EEG, 3 EOG, 1 STIM)
Ch. names*: Fz, FC3, FC1, FCz, FC2, FC4, C5, C3, C1, Cz, ... (+16 more)
Montage*: head
(* from first recording)
Description: 3 recordings × 3 columns [subject, session, run]
Window: 1000 samples (4.000 s)
Targets: 4 unique ({0: 36, 1: 36, 2: 36, 3: 36})}
Total running time of the script: (0 minutes 5.503 seconds)
Estimated memory usage: 1187 MB