Load and save dataset example#

In this example, we show how to load and save braindecode datasets.

# Authors: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)

import tempfile

from braindecode.datasets import MOABBDataset
from braindecode.datautil import load_concat_dataset
from braindecode.preprocessing import (
    Preprocessor,
    create_windows_from_events,
    preprocess,
)

First, we load some dataset using MOABB.

dataset = MOABBDataset(
    dataset_name="BNCI2014001",
    subject_ids=[1],
)

BNCI2014001 has been renamed to BNCI2014_001. BNCI2014001 will be removed in version 1.1.
The dataset class name 'BNCI2014001' must be an abbreviation of its code 'BNCI2014-001'. See moabb.datasets.base.is_abbrev for more information.

We can apply preprocessing steps to the dataset. It is also possible to skip this step and not apply any preprocessing.

preprocess(concat_ds=dataset, preprocessors=[Preprocessor(fn="resample", sfreq=10)])

Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]
Finding events on: stim
48 events found on stim channel stim
Event IDs: [1 2 3 4]

<braindecode.datasets.moabb.MOABBDataset object at 0x7f15d83ae030>

We save the dataset to a an existing directory. It will create a ‘.fif’ file for every dataset in the concat dataset. Additionally it will create two JSON files, the first holding the description of the dataset, the second holding the name of the target. If you want to store to the same directory several times, for example due to trying different preprocessing, you can choose to overwrite the existing files.

tmpdir = tempfile.mkdtemp()  # write in a temporary directory
dataset.save(
    path=tmpdir,
    overwrite=False,
)

Writing /tmp/tmpepb7ygeo/0/0-raw.fif
Closing /tmp/tmpepb7ygeo/0/0-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/1/1-raw.fif
Closing /tmp/tmpepb7ygeo/1/1-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/2/2-raw.fif
Closing /tmp/tmpepb7ygeo/2/2-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/3/3-raw.fif
Closing /tmp/tmpepb7ygeo/3/3-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/4/4-raw.fif
Closing /tmp/tmpepb7ygeo/4/4-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/5/5-raw.fif
Closing /tmp/tmpepb7ygeo/5/5-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/6/6-raw.fif
Closing /tmp/tmpepb7ygeo/6/6-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/7/7-raw.fif
Closing /tmp/tmpepb7ygeo/7/7-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/8/8-raw.fif
Closing /tmp/tmpepb7ygeo/8/8-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/9/9-raw.fif
Closing /tmp/tmpepb7ygeo/9/9-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/10/10-raw.fif
Closing /tmp/tmpepb7ygeo/10/10-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/11/11-raw.fif
Closing /tmp/tmpepb7ygeo/11/11-raw.fif
[done]

We load the saved dataset from a directory. Signals can be preloaded in compliance with mne. Optionally, only specific ‘.fif’ files can be loaded by specifying their ids. The target name can be changed, if the dataset supports it (TUHAbnormal for example supports ‘pathological’, ‘age’, and ‘gender’. If you stored a preprocessed version with target ‘pathological’ it is possible to change the target upon loading).

dataset_loaded = load_concat_dataset(
    path=tmpdir,
    preload=True,
    ids_to_load=[1, 3],
    target_name=None,
)

Opening raw data file /tmp/tmpepb7ygeo/1/1-raw.fif...
    Range : 0 ... 3868 =      0.000 ...   386.800 secs
Ready.
Reading 0 ... 3868  =      0.000 ...   386.800 secs...
Opening raw data file /tmp/tmpepb7ygeo/3/3-raw.fif...
    Range : 0 ... 3868 =      0.000 ...   386.800 secs
Ready.
Reading 0 ... 3868  =      0.000 ...   386.800 secs...

The serialization utility also supports WindowsDatasets, so we create compute windows next.

windows_dataset = create_windows_from_events(
    concat_ds=dataset_loaded,
    trial_start_offset_samples=0,
    trial_stop_offset_samples=0,
)

windows_dataset.description

Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']

	subject	session	run
0	1	0train	1
1	1	0train	3

Again, we save the dataset to an existing directory. It will create a ‘-epo.fif’ file for every dataset in the concat dataset. Additionally it will create a JSON file holding the description of the dataset. If you want to store to the same directory several times, for example due to trying different windowing parameters, you can choose to overwrite the existing files.

windows_dataset.save(
    path=tmpdir,
    overwrite=True,
)

Writing /tmp/tmpepb7ygeo/0/0-raw.fif
Closing /tmp/tmpepb7ygeo/0/0-raw.fif
[done]
Writing /tmp/tmpepb7ygeo/1/1-raw.fif
Closing /tmp/tmpepb7ygeo/1/1-raw.fif
[done]
/home/runner/work/braindecode/braindecode/braindecode/datasets/base.py:778: UserWarning: The number of saved datasets (2) does not match the number of existing subdirectories (12). You may now encounter a mix of differently preprocessed datasets!
  warnings.warn(
/home/runner/work/braindecode/braindecode/braindecode/datasets/base.py:789: UserWarning: Chosen directory /tmp/tmpepb7ygeo contains other subdirectories or files ['2', '7', '10', '5', '3', '4', '8', '11', '9', '6'].
  warnings.warn(

Load the saved dataset from a directory. Signals can be preloaded in compliance with mne. Optionally, only specific ‘-epo.fif’ files can be loaded by specifying their ids.

windows_dataset_loaded = load_concat_dataset(
    path=tmpdir,
    preload=False,
    ids_to_load=[0],
    target_name=None,
)

windows_dataset_loaded.description

Opening raw data file /tmp/tmpepb7ygeo/0/0-raw.fif...
    Range : 0 ... 3868 =      0.000 ...   386.800 secs
Ready.

	subject	session	run
0	1	0train	1

Total running time of the script: (0 minutes 5.365 seconds)

Estimated memory usage: 861 MB

Gallery generated by Sphinx-Gallery