Load and save dataset example#

In this example, we show how to load and save braindecode datasets.

# Authors: Lukas Gemein <l.gemein@gmail.com>
#
# License: BSD (3-clause)

import tempfile

from braindecode.datasets import MOABBDataset
from braindecode.preprocessing import preprocess, Preprocessor
from braindecode.datautil import load_concat_dataset
from braindecode.preprocessing import create_windows_from_events

First, we load some dataset using MOABB.

dataset = MOABBDataset(
    dataset_name="BNCI2014001",
    subject_ids=[1],
)
BNCI2014001 has been renamed to BNCI2014_001. BNCI2014001 will be removed in version 1.1.
The dataset class name 'BNCI2014001' must be an abbreviation of its code 'BNCI2014-001'. See moabb.datasets.base.is_abbrev for more information.
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]

We can apply preprocessing steps to the dataset. It is also possible to skip this step and not apply any preprocessing.

preprocess(concat_ds=dataset, preprocessors=[Preprocessor(fn="resample", sfreq=10)])
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]
48 events found on stim channel stim
Event IDs: [1 2 3 4]

<braindecode.datasets.moabb.MOABBDataset object at 0x7f38e7ede800>

We save the dataset to a an existing directory. It will create a ‘.fif’ file for every dataset in the concat dataset. Additionally it will create two JSON files, the first holding the description of the dataset, the second holding the name of the target. If you want to store to the same directory several times, for example due to trying different preprocessing, you can choose to overwrite the existing files.

tmpdir = tempfile.mkdtemp()  # write in a temporary directory
dataset.save(
    path=tmpdir,
    overwrite=False,
)
Writing /tmp/tmpdz2kwg3z/0/0-raw.fif
Closing /tmp/tmpdz2kwg3z/0/0-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/1/1-raw.fif
Closing /tmp/tmpdz2kwg3z/1/1-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/2/2-raw.fif
Closing /tmp/tmpdz2kwg3z/2/2-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/3/3-raw.fif
Closing /tmp/tmpdz2kwg3z/3/3-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/4/4-raw.fif
Closing /tmp/tmpdz2kwg3z/4/4-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/5/5-raw.fif
Closing /tmp/tmpdz2kwg3z/5/5-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/6/6-raw.fif
Closing /tmp/tmpdz2kwg3z/6/6-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/7/7-raw.fif
Closing /tmp/tmpdz2kwg3z/7/7-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/8/8-raw.fif
Closing /tmp/tmpdz2kwg3z/8/8-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/9/9-raw.fif
Closing /tmp/tmpdz2kwg3z/9/9-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/10/10-raw.fif
Closing /tmp/tmpdz2kwg3z/10/10-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/11/11-raw.fif
Closing /tmp/tmpdz2kwg3z/11/11-raw.fif
[done]

We load the saved dataset from a directory. Signals can be preloaded in compliance with mne. Optionally, only specific ‘.fif’ files can be loaded by specifying their ids. The target name can be changed, if the dataset supports it (TUHAbnormal for example supports ‘pathological’, ‘age’, and ‘gender’. If you stored a preprocessed version with target ‘pathological’ it is possible to change the target upon loading).

dataset_loaded = load_concat_dataset(
    path=tmpdir,
    preload=True,
    ids_to_load=[1, 3],
    target_name=None,
)
Opening raw data file /tmp/tmpdz2kwg3z/1/1-raw.fif...
    Range : 0 ... 3868 =      0.000 ...   386.800 secs
Ready.
Reading 0 ... 3868  =      0.000 ...   386.800 secs...
Opening raw data file /tmp/tmpdz2kwg3z/3/3-raw.fif...
    Range : 0 ... 3868 =      0.000 ...   386.800 secs
Ready.
Reading 0 ... 3868  =      0.000 ...   386.800 secs...

The serialization utility also supports WindowsDatasets, so we create compute windows next.

windows_dataset = create_windows_from_events(
    concat_ds=dataset_loaded,
    trial_start_offset_samples=0,
    trial_stop_offset_samples=0,
)

windows_dataset.description
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
Used Annotations descriptions: ['feet', 'left_hand', 'right_hand', 'tongue']
subject session run
0 1 0train 1
1 1 0train 3


Again, we save the dataset to an existing directory. It will create a ‘-epo.fif’ file for every dataset in the concat dataset. Additionally it will create a JSON file holding the description of the dataset. If you want to store to the same directory several times, for example due to trying different windowing parameters, you can choose to overwrite the existing files.

windows_dataset.save(
    path=tmpdir,
    overwrite=True,
)
Writing /tmp/tmpdz2kwg3z/0/0-raw.fif
Closing /tmp/tmpdz2kwg3z/0/0-raw.fif
[done]
Writing /tmp/tmpdz2kwg3z/1/1-raw.fif
Closing /tmp/tmpdz2kwg3z/1/1-raw.fif
[done]
/home/runner/work/braindecode/braindecode/braindecode/datasets/base.py:777: UserWarning: The number of saved datasets (2) does not match the number of existing subdirectories (12). You may now encounter a mix of differently preprocessed datasets!
  warnings.warn(
/home/runner/work/braindecode/braindecode/braindecode/datasets/base.py:788: UserWarning: Chosen directory /tmp/tmpdz2kwg3z contains other subdirectories or files ['5', '4', '2', '9', '3', '6', '7', '11', '10', '8'].
  warnings.warn(

Load the saved dataset from a directory. Signals can be preloaded in compliance with mne. Optionally, only specific ‘-epo.fif’ files can be loaded by specifying their ids.

windows_dataset_loaded = load_concat_dataset(
    path=tmpdir,
    preload=False,
    ids_to_load=[0],
    target_name=None,
)

windows_dataset_loaded.description
Opening raw data file /tmp/tmpdz2kwg3z/0/0-raw.fif...
    Range : 0 ... 3868 =      0.000 ...   386.800 secs
Ready.
subject session run
0 1 0train 1


Total running time of the script: (0 minutes 4.323 seconds)

Estimated memory usage: 1032 MB

Gallery generated by Sphinx-Gallery