Note

Go to the end to download the full example code

Split Dataset Example#

In this example, we aim to show multiple ways of how you can split your datasets for training, testing, and evaluating your models.

# Authors: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)

from braindecode.datasets import MOABBDataset
from braindecode.preprocessing import create_windows_from_events

Loading the dataset #

Firstly, we create a dataset using the braindecode class <MOABBDataset> to load it fetched from MOABB. In this example, we’re using Dataset 2a from BCI Competition IV.

dataset = MOABBDataset(dataset_name="BNCI2014001", subject_ids=[1])

BNCI2014001 has been renamed to BNCI2014_001. BNCI2014001 will be removed in version 1.1.
The dataset class name 'BNCI2014001' must be an abbreviation of its code 'BNCI2014-001'. See moabb.datasets.base.is_abbrev for more information.

The class <MOABBDataset> has a pandas DataFrame containing additional description of its internal datasets, which can be used to help splitting the data based on recording information, such as subject, session, and run of each trial.

dataset.description

	subject	session	run
0	1	0train	0
1	1	0train	1
2	1	0train	2
3	1	0train	3
4	1	0train	4
5	1	0train	5
6	1	1test	0
7	1	1test	1
8	1	1test	2
9	1	1test	3
10	1	1test	4
11	1	1test	5

Here, we’re splitting the data based on different runs. The method split returns a dictionary with string keys corresponding to unique entries in the description DataFrame column.

splits = dataset.split("run")
print(splits)
splits["4"].description

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4215eef7c0>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41edd9b7c0>, '2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41ed644dc0>, '3': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41ed644760>, '4': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4218ff5670>, '5': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4216e7fb50>}

	subject	session	run
0	1	0train	4
1	1	1test	4

By row index #

Another way we can split the dataset is based on a list of integers corresponding to rows in the description. In this case, the returned dictionary will have ‘0’ as the only key.

splits = dataset.split([0, 1, 5])
print(splits)
splits["0"].description

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41f52db2e0>}

	subject	session	run
0	1	0train	0
1	1	0train	1
2	1	0train	5

However, if we want multiple splits based on indices, we can also define a list containing lists of integers. In this case, the dictionary will have string keys representing the index of the dataset split in the order of the given list of integers.

splits = dataset.split([[0, 1, 5], [2, 3, 4], [6, 7, 8, 9, 10, 11]])
print(splits)
splits["2"].description

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41eda85670>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41f52db4f0>, '2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41f52db0a0>}

	subject	session	run
0	1	1test	0
1	1	1test	1
2	1	1test	2
3	1	1test	3
4	1	1test	4
5	1	1test	5

You can also name each split in the output dictionary by specifying the keys of each list of indexes in the input dictionary:

splits = dataset.split(
    {"train": [0, 1, 5], "valid": [2, 3, 4], "test": [6, 7, 8, 9, 10, 11]}
)
print(splits)
splits["test"].description

{'train': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41ed644760>, 'valid': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4216e7ff70>, 'test': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4218f23a60>}

	subject	session	run
0	1	1test	0
1	1	1test	1
2	1	1test	2
3	1	1test	3
4	1	1test	4
5	1	1test	5

Observation #

Similarly, we can split datasets after creating windows using the same methods.

windows = create_windows_from_events(
    dataset, trial_start_offset_samples=0, trial_stop_offset_samples=0)

Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']

# Splitting by different runs
print("Using description info")
splits = windows.split("run")
print(splits)
print()

# Splitting by row index
print("Splitting by row index")
splits = windows.split([4, 8])
print(splits)
print()

print("Multiple row index split")
splits = windows.split([[4, 8], [5, 9, 11]])
print(splits)
print()

# Specifying output's keys
print("Specifying keys")
splits = windows.split(dict(train=[4, 8], test=[5, 9, 11]))
print(splits)

Using description info
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f42175d8970>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f42175d8b80>, '2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f42175d8d90>, '3': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4217525250>, '4': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4217525670>, '5': <braindecode.datasets.base.BaseConcatDataset object at 0x7f42174c5af0>}

Splitting by row index
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f421d2921f0>}

Multiple row index split
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f4217525250>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f42175d8d90>}

Specifying keys
{'train': <braindecode.datasets.base.BaseConcatDataset object at 0x7f41ed644760>, 'test': <braindecode.datasets.base.BaseConcatDataset object at 0x7f42175d8b80>}

Total running time of the script: (0 minutes 6.031 seconds)

Estimated memory usage: 10 MB

Gallery generated by Sphinx-Gallery

Split Dataset Example#

Loading the dataset #

Splitting #

By description information #

By row index #

Observation #

Split Dataset Example#

Loading the dataset#

Splitting#

By description information#

By row index#

Observation#

Loading the dataset #

Splitting #

By description information #

By row index #

Observation #