Split Dataset Example#

In this example, we show multiple ways of how to split datasets.

# Authors: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)

from braindecode.datasets import MOABBDataset
from braindecode.preprocessing import create_windows_from_events

First, we create a dataset based on BCIC IV 2a fetched with MOABB,

dataset = MOABBDataset(dataset_name="BNCI2014001", subject_ids=[1])

48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]

ds has a pandas DataFrame with additional description of its internal datasets

dataset.description

	subject	session	run
0	1	session_T	run_0
1	1	session_T	run_1
2	1	session_T	run_2
3	1	session_T	run_3
4	1	session_T	run_4
5	1	session_T	run_5
6	1	session_E	run_0
7	1	session_E	run_1
8	1	session_E	run_2
9	1	session_E	run_3
10	1	session_E	run_4
11	1	session_E	run_5

We can split the dataset based on the info in the description, for example based on different runs. The returned dictionary will have string keys corresponding to unique entries in the description DataFrame column

splits = dataset.split("run")
print(splits)
splits["run_4"].description

{'run_0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21c2d10>, 'run_1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21400d0>, 'run_2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1f75590>, 'run_3': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1f00390>, 'run_4': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1f00510>, 'run_5': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1fb3ed0>}

	subject	session	run
0	1	session_T	run_4
1	1	session_E	run_4

We can also split the dataset based on a list of integers corresponding to rows in the description. In this case, the returned dictionary will have ‘0’ as the only key

splits = dataset.split([0, 1, 5])
print(splits)
splits["0"].description

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b216ef50>}

	subject	session	run
0	1	session_T	run_0
1	1	session_T	run_1
2	1	session_T	run_5

If we want multiple splits based on indices, we can also specify a list of list of integers. In this case, the dictionary will have string keys representing the id of the dataset split in the order of the given list of integers

splits = dataset.split([[0, 1, 5], [2, 3, 4], [6, 7, 8, 9, 10, 11]])
print(splits)
splits["2"].description

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1fb3ed0>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50d5718c50>, '2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50d57183d0>}

	subject	session	run
0	1	session_E	run_0
1	1	session_E	run_1
2	1	session_E	run_2
3	1	session_E	run_3
4	1	session_E	run_4
5	1	session_E	run_5

If we want to split based on a list of indices but you want to specify the keys in the output dictionary you can pass a dict as:

splits = dataset.split(
    {"train": [0, 1, 5], "valid": [2, 3, 4], "test": [6, 7, 8, 9, 10, 11]}
)
print(splits)
splits["test"].description

{'train': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21f3310>, 'valid': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50d5718b50>, 'test': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50d5718890>}

	subject	session	run
0	1	session_E	run_0
1	1	session_E	run_1
2	1	session_E	run_2
3	1	session_E	run_3
4	1	session_E	run_4
5	1	session_E	run_5

Similarly, we can split datasets after creating windows

windows = create_windows_from_events(
    dataset, trial_start_offset_samples=0, trial_stop_offset_samples=0)
splits = windows.split("run")
splits

Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Using data from preloaded Raw for 48 events and 1000 original time points ...
0 bad epochs dropped

{'run_0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b20628d0>, 'run_1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21c2d10>, 'run_2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50d572ee50>, 'run_3': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b201e7d0>, 'run_4': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b201ea90>, 'run_5': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1f75590>}

splits = windows.split([4, 8])
splits

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21808d0>}

splits = windows.split([[4, 8], [5, 9, 11]])
splits

{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21400d0>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b1f43cd0>}

splits = windows.split(dict(train=[4, 8], test=[5, 9, 11]))
splits

{'train': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21f3f90>, 'test': <braindecode.datasets.base.BaseConcatDataset object at 0x7f50b21c2d10>}

Total running time of the script: ( 0 minutes 3.982 seconds)

Estimated memory usage: 333 MB

Gallery generated by Sphinx-Gallery