Note
Click here to download the full example code
Split Dataset Example¶
In this example, we show multiple ways of how to split datasets.
# Authors: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)
from braindecode.datasets import MOABBDataset
from braindecode.preprocessing import create_windows_from_events
First, we create a dataset based on BCIC IV 2a fetched with MOABB,
dataset = MOABBDataset(dataset_name="BNCI2014001", subject_ids=[1])
Out:
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
48 events found
Event IDs: [1 2 3 4]
ds has a pandas DataFrame with additional description of its internal datasets
dataset.description
We can split the dataset based on the info in the description, for example based on different runs. The returned dictionary will have string keys corresponding to unique entries in the description DataFrame column
splits = dataset.split("run")
print(splits)
splits["run_4"].description
Out:
{'run_0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b1f210>, 'run_1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490e93810>, 'run_2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b21890>, 'run_3': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b25410>, 'run_4': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490bc2950>, 'run_5': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748c1a5f50>}
We can also split the dataset based on a list of integers corresponding to rows in the description. In this case, the returned dictionary will have ‘0’ as the only key
splits = dataset.split([0, 1, 5])
print(splits)
splits["0"].description
Out:
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748c6bbb90>}
If we want multiple splits based on indices, we can also specify a list of list of integers. In this case, the dictionary will have string keys representing the id of the dataset split in the order of the given list of integers
splits = dataset.split([[0, 1, 5], [2, 3, 4], [6, 7, 8, 9, 10, 11]])
print(splits)
splits["2"].description
Out:
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b25410>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b49e90>, '2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748ee50590>}
If we want to split based on a list of indices but you want to specify the keys in the output dictionary you can pass a dict as:
splits = dataset.split(
{"train": [0, 1, 5], "valid": [2, 3, 4], "test": [6, 7, 8, 9, 10, 11]}
)
print(splits)
splits["test"].description
Out:
{'train': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748c1a5f50>, 'valid': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748ee506d0>, 'test': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748ee50450>}
Similarly, we can split datasets after creating windows
windows = create_windows_from_events(
dataset, trial_start_offset_samples=0, trial_stop_offset_samples=0)
splits = windows.split("run")
splits
Out:
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Adding metadata with 4 columns
Replacing existing metadata with 4 columns
48 matching events found
No baseline correction applied
0 projection items activated
Loading data for 48 events and 1000 original time points ...
0 bad epochs dropped
{'run_0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b8f110>, 'run_1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748c19ef10>, 'run_2': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748ee22910>, 'run_3': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748ee22350>, 'run_4': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b853d0>, 'run_5': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b40450>}
splits = windows.split([4, 8])
splits
Out:
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490e80d90>}
splits = windows.split([[4, 8], [5, 9, 11]])
splits
Out:
{'0': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490e93810>, '1': <braindecode.datasets.base.BaseConcatDataset object at 0x7f7490b1f210>}
splits = windows.split(dict(train=[4, 8], test=[5, 9, 11]))
splits
Out:
{'train': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748c19e6d0>, 'test': <braindecode.datasets.base.BaseConcatDataset object at 0x7f748c19eb10>}
Total running time of the script: ( 0 minutes 4.236 seconds)
Estimated memory usage: 354 MB