braindecode.datasets package#

Loader code for some datasets.

class braindecode.datasets.BCICompetitionIVDataset4(subject_ids: list[int] | int | None = None)[source]#

Bases: BaseConcatDataset

BCI competition IV dataset 4.

Contains ECoG recordings for three patients moving fingers during the experiment. Targets correspond to the time courses of the flexion of each of five fingers. See http://www.bbci.de/competition/iv/desc_4.pdf and http://www.bbci.de/competition/iv/ for the dataset and competition description. ECoG library containing the dataset: https://searchworks.stanford.edu/view/zk881ps0522

Notes

When using this dataset please cite [1] .

Parameters:

subject_ids (list(int) | int | None) – (list of) int of subject(s) to be loaded. If None, load all available subjects. Should be in range 1-3.

References

[1]

Miller, Kai J. “A library of human electrocorticographic data and analyses.” Nature human behaviour 3, no. 11 (2019): 1225-1235. https://doi.org/10.1038/s41562-019-0678-3

static download(path=None, force_update=False, verbose=None)[source]#

Download the dataset.

Parameters:
  • location. (path (None | str) – Location of where to look for the data storing)

  • None (or None) – If not)

  • parameter (the environment variable or config)

  • exist (MNE_DATASETS_(dataset)_PATH is used. If it doesn’t)

  • “~/mne_data” (the)

  • path (directory is used. If the dataset is not found under the given)

  • data (the)

  • folder. (will be automatically downloaded to the specified)

  • exists. (force_update (bool) – Force update of the dataset even if a local copy)

  • (bool (verbose)

  • str

  • int

  • None

  • level (override default verbose)

  • mne.verbose()) ((see)

possible_subjects = [1, 2, 3]#
class braindecode.datasets.BIDSDataset(root: ~pathlib.Path | str, subjects: str | list[str] | None = None, sessions: str | list[str] | None = None, tasks: str | list[str] | None = None, acquisitions: str | list[str] | None = None, runs: str | list[str] | None = None, processings: str | list[str] | None = None, recordings: str | list[str] | None = None, spaces: str | list[str] | None = None, splits: str | list[str] | None = None, descriptions: str | list[str] | None = None, suffixes: str | list[str] | None = None, extensions: str | list[str] | None = <factory>, datatypes: str | list[str] | None = None, check: bool = False, preload: bool = False, n_jobs: int = 1)[source]#

Bases: BaseConcatDataset

Dataset for loading BIDS.

This class has the same parameters as the mne_bids.find_matching_paths() function as it will be used to find the files to load. The default extensions parameter was changed.

More information on BIDS (Brain Imaging Data Structure) can be found at https://bids.neuroimaging.io

Note

For loading “unofficial” BIDS datasets containing epoched data, you can use BIDSEpochsDataset.

Parameters:
  • root (pathlib.Path | str) – The root of the BIDS path.

  • subjects (str | array-like of str | None) – The subject ID. Corresponds to “sub”.

  • sessions (str | array-like of str | None) – The acquisition session. Corresponds to “ses”.

  • tasks (str | array-like of str | None) – The experimental task. Corresponds to “task”.

  • acquisitions (str | array-like of str | None) – The acquisition parameters. Corresponds to “acq”.

  • runs (str | array-like of str | None) – The run number. Corresponds to “run”.

  • processings (str | array-like of str | None) – The processing label. Corresponds to “proc”.

  • recordings (str | array-like of str | None) – The recording name. Corresponds to “rec”.

  • spaces (str | array-like of str | None) – The coordinate space for anatomical and sensor location files (e.g., *_electrodes.tsv, *_markers.mrk). Corresponds to “space”. Note that valid values for space must come from a list of BIDS keywords as described in the BIDS specification.

  • splits (str | array-like of str | None) – The split of the continuous recording file for .fif data. Corresponds to “split”.

  • descriptions (str | array-like of str | None) – This corresponds to the BIDS entity desc. It is used to provide additional information for derivative data, e.g., preprocessed data may be assigned description='cleaned'.

  • suffixes (str | array-like of str | None) – The filename suffix. This is the entity after the last _ before the extension. E.g., 'channels'. The following filename suffix’s are accepted: ‘meg’, ‘markers’, ‘eeg’, ‘ieeg’, ‘T1w’, ‘participants’, ‘scans’, ‘electrodes’, ‘coordsystem’, ‘channels’, ‘events’, ‘headshape’, ‘digitizer’, ‘beh’, ‘physio’, ‘stim’

  • extensions (str | array-like of str | None) – The extension of the filename. E.g., '.json'. By default, uses the ones accepted by mne_bids.read_raw_bids().

  • datatypes (str | array-like of str | None) – The BIDS data type, e.g., 'anat', 'func', 'eeg', 'meg', 'ieeg'.

  • check (bool) – If True, only returns paths that conform to BIDS. If False (default), the .check attribute of the returned mne_bids.BIDSPath object will be set to True for paths that do conform to BIDS, and to False for those that don’t.

  • preload (bool) – If True, preload the data. Defaults to False.

  • n_jobs (int) – Number of jobs to run in parallel. Defaults to 1.

acquisitions: str | list[str] | None = None#
check: bool = False#
datatypes: str | list[str] | None = None#
descriptions: str | list[str] | None = None#
extensions: str | list[str] | None#
n_jobs: int = 1#
preload: bool = False#
processings: str | list[str] | None = None#
recordings: str | list[str] | None = None#
root: Path | str#
runs: str | list[str] | None = None#
sessions: str | list[str] | None = None#
spaces: str | list[str] | None = None#
splits: str | list[str] | None = None#
subjects: str | list[str] | None = None#
suffixes: str | list[str] | None = None#
tasks: str | list[str] | None = None#
class braindecode.datasets.BIDSEpochsDataset(*args, **kwargs)[source]#

Bases: BIDSDataset

Experimental dataset for loading mne.Epochs organised in BIDS.

The files must end with _epo.fif.

Warning

Epoched data is not officially supported in BIDS.

Note

Parameters: This class has the same parameters as BIDSDataset except for arguments datatypes, extensions and check which are fixed.

class braindecode.datasets.BNCI2014001(subject_ids)[source]#

Bases: MOABBDataset

BNCI 2014-001 Motor Imagery dataset.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/bnci2014-001-moabb-1

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

9

22

4

144

4

250

2

6

62208

Dataset IIa from BCI Competition 4 [R382d436f3223-1].

Dataset Description

This data set consists of EEG data from 9 subjects. The cue-based BCI paradigm consisted of four different motor imagery tasks, namely the imag- ination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and tongue (class 4). Two sessions on different days were recorded for each subject. Each session is comprised of 6 runs separated by short breaks. One run consists of 48 trials (12 for each of the four possible classes), yielding a total of 288 trials per session.

The subjects were sitting in a comfortable armchair in front of a computer screen. At the beginning of a trial ( t = 0 s), a fixation cross appeared on the black screen. In addition, a short acoustic warning tone was presented. After two seconds ( t = 2 s), a cue in the form of an arrow pointing either to the left, right, down or up (corresponding to one of the four classes left hand, right hand, foot or tongue) appeared and stayed on the screen for 1.25 s. This prompted the subjects to perform the desired motor imagery task. No feedback was provided. The subjects were ask to carry out the motor imagery task until the fixation cross disappeared from the screen at t = 6 s.

Twenty-two Ag/AgCl electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG; the montage is shown in Figure 3 left. All signals were recorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The signals were sampled with. 250 Hz and bandpass-filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 μV . An additional 50 Hz notch filter was enabled to suppress line noise

Parameters:
subject_ids: list(int) | int | None

(list of) int of subject(s) to be fetched. If None, data of all subjects is fetched.

See moabb.datasets.bnci.BNCI2014001

class BNCI2014001(*args, **kwargs)[source]#

Bases: BNCI2014_001

BNCI 2014-001 Motor Imagery dataset.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/bnci2014-001-moabb-1

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

9

22

4

144

4

250

2

6

62208

Dataset IIa from BCI Competition 4 [1].

Dataset Description

This data set consists of EEG data from 9 subjects. The cue-based BCI paradigm consisted of four different motor imagery tasks, namely the imag- ination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and tongue (class 4). Two sessions on different days were recorded for each subject. Each session is comprised of 6 runs separated by short breaks. One run consists of 48 trials (12 for each of the four possible classes), yielding a total of 288 trials per session.

The subjects were sitting in a comfortable armchair in front of a computer screen. At the beginning of a trial ( t = 0 s), a fixation cross appeared on the black screen. In addition, a short acoustic warning tone was presented. After two seconds ( t = 2 s), a cue in the form of an arrow pointing either to the left, right, down or up (corresponding to one of the four classes left hand, right hand, foot or tongue) appeared and stayed on the screen for 1.25 s. This prompted the subjects to perform the desired motor imagery task. No feedback was provided. The subjects were ask to carry out the motor imagery task until the fixation cross disappeared from the screen at t = 6 s.

Twenty-two Ag/AgCl electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG; the montage is shown in Figure 3 left. All signals were recorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The signals were sampled with. 250 Hz and bandpass-filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 μV . An additional 50 Hz notch filter was enabled to suppress line noise

References

[1]

Tangermann, M., Müller, K.R., Aertsen, A., Birbaumer, N., Braun, C., Brunner, C., Leeb, R., Mehring, C., Miller, K.J., Mueller-Putz, G. and Nolte, G., 2012. Review of the BCI competition IV. Frontiers in neuroscience, 6, p.55.

doc = 'See moabb.datasets.bnci.BNCI2014001\n\n    Parameters\n    ----------\n    subject_ids: list(int) | int | None\n        (list of) int of subject(s) to be fetched. If None, data of all\n        subjects is fetched.\n    '#
class braindecode.datasets.BaseConcatDataset(list_of_ds: list[BaseDataset | BaseConcatDataset | WindowsDataset] | None = None, target_transform: Callable | None = None)[source]#

Bases: ConcatDataset

A base class for concatenated datasets.

Holds either mne.Raw or mne.Epoch in self.datasets and has a pandas DataFrame with additional description.

Parameters:
  • list_of_ds (list) – list of BaseDataset, BaseConcatDataset or WindowsDataset

  • target_transform (callable | None) – Optional function to call on targets before returning them.

property description: DataFrame#
get_metadata() DataFrame[source]#

Concatenate the metadata and description of the wrapped Epochs.

Returns:

metadata – DataFrame containing as many rows as there are windows in the BaseConcatDataset, with the metadata and description information for each window.

Return type:

pd.DataFrame

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset: path/

0/

0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)

1/

1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)

Parameters:
  • path (str) –

    Directory in which subdirectories are created to store

    -raw.fif | -epo.fif and .json files to.

  • overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.

  • offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

set_description(description: dict | DataFrame, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.DataFrame) – Description in the form key: value where the length of the value has to match the number of datasets.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

split(by: str | list[int] | list[list[int]] | dict[str, list[int]] | None = None, property: str | None = None, split_ids: list[int] | list[list[int]] | dict[str, list[int]] | None = None) dict[str, BaseConcatDataset][source]#

Split the dataset based on information listed in its description.

The format could be based on a DataFrame or based on indices.

Parameters:
  • by (str | list | dict) – If by is a string, splitting is performed based on the description DataFrame column with this name. If by is a (list of) list of integers, the position in the first list corresponds to the split id and the integers to the datapoints of that split. If a dict then each key will be used in the returned splits dict and each value should be a list of int.

  • property (str) – Some property which is listed in the info DataFrame.

  • split_ids (list | dict) – List of indices to be combined in a subset. It can be a list of int or a list of list of int.

Returns:

splits – A dictionary with the name of the split (a string) as key and the dataset as value.

Return type:

dict

property target_transform#
property transform#
class braindecode.datasets.BaseDataset(raw: BaseRaw, description: dict | Series | None = None, target_name: str | tuple[str, ...] | None = None, transform: Callable | None = None)[source]#

Bases: Dataset

Returns samples from an mne.io.Raw object along with a target.

Dataset which serves samples from an mne.io.Raw object along with a target. The target is unique for the dataset, and is obtained through the description attribute.

Parameters:
  • raw (mne.io.Raw) – Continuous data.

  • description (dict | pandas.Series | None) – Holds additional description about the continuous signal / subject.

  • target_name (str | tuple | None) – Name(s) of the index in description that should be used to provide the target (e.g., to be used in a prediction task later on).

  • transform (callable | None) – On-the-fly transform applied to the example before it is returned.

property description: Series#
set_description(description: dict | Series, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.Series) – Description in the form key: value.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

property transform#
class braindecode.datasets.HGD(subject_ids)[source]#

Bases: MOABBDataset

High-gamma dataset described in Schirrmeister et al. 2017.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/schirrmeister2017-moabb

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

14

128

4

120

4

500

1

2

13440

Dataset from [R5e478952091a-1]

Our “High-Gamma Dataset” is a 128-electrode dataset (of which we later only use 44 sensors covering the motor cortex, (see Section 2.7.1), obtained from 14 healthy subjects (6 female, 2 left-handed, age 27.2 ± 3.6 (mean ± std)) with roughly 1000 (963.1 ± 150.9, mean ± std) four-second trials of executed movements divided into 13 runs per subject. The four classes of movements were movements of either the left hand, the right hand, both feet, and rest (no movement, but same type of visual cue as for the other classes). The training set consists of the approx. 880 trials of all runs except the last two runs, the test set of the approx. 160 trials of the last 2 runs. This dataset was acquired in an EEG lab optimized for non-invasive detection of high- frequency movement-related EEG components (Ball et al., 2008; Darvas et al., 2010).

Depending on the direction of a gray arrow that was shown on black back- ground, the subjects had to repetitively clench their toes (downward arrow), perform sequential finger-tapping of their left (leftward arrow) or right (rightward arrow) hand, or relax (upward arrow). The movements were selected to require little proximal muscular activity while still being complex enough to keep subjects in- volved. Within the 4-s trials, the subjects performed the repetitive movements at their own pace, which had to be maintained as long as the arrow was showing. Per run, 80 arrows were displayed for 4 s each, with 3 to 4 s of continuous random inter-trial interval. The order of presentation was pseudo-randomized, with all four arrows being shown every four trials. Ideally 13 runs were performed to collect 260 trials of each movement and rest. The stimuli were presented and the data recorded with BCI2000 (Schalk et al., 2004). The experiment was approved by the ethical committee of the University of Freiburg.

Parameters:
subject_ids: list(int) | int | None

(list of) int of subject(s) to be fetched. If None, data of all subjects is fetched.

See moabb.datasets.schirrmeister2017.Schirrmeister2017

class Schirrmeister2017[source]#

Bases: BaseDataset

High-gamma dataset described in Schirrmeister et al. 2017.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/schirrmeister2017-moabb

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

14

128

4

120

4

500

1

2

13440

Dataset from [1]

Our “High-Gamma Dataset” is a 128-electrode dataset (of which we later only use 44 sensors covering the motor cortex, (see Section 2.7.1), obtained from 14 healthy subjects (6 female, 2 left-handed, age 27.2 ± 3.6 (mean ± std)) with roughly 1000 (963.1 ± 150.9, mean ± std) four-second trials of executed movements divided into 13 runs per subject. The four classes of movements were movements of either the left hand, the right hand, both feet, and rest (no movement, but same type of visual cue as for the other classes). The training set consists of the approx. 880 trials of all runs except the last two runs, the test set of the approx. 160 trials of the last 2 runs. This dataset was acquired in an EEG lab optimized for non-invasive detection of high- frequency movement-related EEG components (Ball et al., 2008; Darvas et al., 2010).

Depending on the direction of a gray arrow that was shown on black back- ground, the subjects had to repetitively clench their toes (downward arrow), perform sequential finger-tapping of their left (leftward arrow) or right (rightward arrow) hand, or relax (upward arrow). The movements were selected to require little proximal muscular activity while still being complex enough to keep subjects in- volved. Within the 4-s trials, the subjects performed the repetitive movements at their own pace, which had to be maintained as long as the arrow was showing. Per run, 80 arrows were displayed for 4 s each, with 3 to 4 s of continuous random inter-trial interval. The order of presentation was pseudo-randomized, with all four arrows being shown every four trials. Ideally 13 runs were performed to collect 260 trials of each movement and rest. The stimuli were presented and the data recorded with BCI2000 (Schalk et al., 2004). The experiment was approved by the ethical committee of the University of Freiburg.

References

[1]

Schirrmeister, Robin Tibor, et al. “Deep learning with convolutional neural networks for EEG decoding and visualization.” Human brain mapping 38.11 (2017): 5391-5420.

data_path(subject, path=None, force_update=False, update_path=None, verbose=None)[source]#

Get path to local copy of a subject data.

Parameters:
  • subject (int) – Number of subject to use

  • path (None | str) – Location of where to look for the data storing location. If None, the environment variable or config parameter MNE_DATASETS_(dataset)_PATH is used. If it doesn’t exist, the “~/mne_data” directory is used. If the dataset is not found under the given path, the data will be automatically downloaded to the specified folder.

  • force_update (bool) – Force update of the dataset even if a local copy exists.

  • update_path (bool | None Deprecated) – If True, set the MNE_DATASETS_(dataset)_PATH in mne-python config to the given path. If None, the user is prompted.

  • verbose (bool, str, int, or None) – If not None, override default verbose level (see mne.verbose()).

Returns:

path – Local path to the given data file. This path is contained inside a list of length one, for compatibility.

Return type:

list of str

doc = 'See moabb.datasets.schirrmeister2017.Schirrmeister2017\n\n    Parameters\n    ----------\n    subject_ids: list(int) | int | None\n        (list of) int of subject(s) to be fetched. If None, data of all\n        subjects is fetched.\n    '#
class braindecode.datasets.MOABBDataset(dataset_name: str, subject_ids: list[int] | int | None = None, dataset_kwargs: dict[str, Any] | None = None, dataset_load_kwargs: dict[str, Any] | None = None)[source]#

Bases: BaseConcatDataset

A class for moabb datasets.

Parameters:
  • dataset_name (str) – name of dataset included in moabb to be fetched

  • subject_ids (list(int) | int | None) – (list of) int of subject(s) to be fetched. If None, data of all subjects is fetched.

  • dataset_kwargs (dict, optional) – optional dictionary containing keyword arguments to pass to the moabb dataset when instantiating it.

  • dataset_load_kwargs (dict, optional) – optional dictionary containing keyword arguments to pass to the moabb dataset’s load_data method. Allows using the moabb cache_config=None and process_pipeline=None.

class braindecode.datasets.NMT(path=None, target_name='pathological', recording_ids=None, preload=False, n_jobs=1)[source]#

Bases: BaseConcatDataset

The NMT Scalp EEG Dataset.

An Open-Source Annotated Dataset of Healthy and Pathological EEG Recordings for Predictive Modeling.

This dataset contains 2,417 recordings from unique participants spanning almost 625 h.

Here, the dataset can be used for three tasks, brain-age, gender prediction, abnormality detection.

The dataset is described in [Khan2022].

Added in version 0.9.

Parameters:
  • path (str) – Parent directory of the dataset.

  • recording_ids (list(int) | int) – A (list of) int of recording id(s) to be read (order matters and will overwrite default chronological order, e.g. if recording_ids=[1,0], then the first recording returned by this class will be chronologically later than the second recording. Provide recording_ids in ascending order to preserve chronological order.).

  • target_name (str) – Can be “pathological”, “gender”, or “age”.

  • preload (bool) – If True, preload the data of the Raw objects.

References

[Khan2022]

Khan, H.A.,Ul Ain, R., Kamboh, A.M., Butt, H.T.,Shafait,S., Alamgir, W., Stricker, D. and Shafait, F., 2022. The NMT scalp EEG dataset: an open-source annotated dataset of healthy and pathological EEG recordings for predictive modeling. Frontiers in neuroscience, 15, p.755817.

class braindecode.datasets.SleepPhysionet(subject_ids: list[int] | int | None = None, recording_ids: list[int] | None = None, preload=False, load_eeg_only=True, crop_wake_mins=30, crop=None)[source]#

Bases: BaseConcatDataset

Sleep Physionet dataset.

Sleep dataset from https://physionet.org/content/sleep-edfx/1.0.0/. Contains overnight recordings from 78 healthy subjects.

See MNE example <https://mne.tools/stable/auto_tutorials/clinical/60_sleep.html>.

Parameters:
  • subject_ids (list(int) | int | None) – (list of) int of subject(s) to be loaded. If None, load all available subjects.

  • recording_ids (list(int) | None) – Recordings to load per subject (each subject except 13 has two recordings). Can be [1], [2] or [1, 2] (same as None).

  • preload (bool) – If True, preload the data of the Raw objects.

  • load_eeg_only (bool) – If True, only load the EEG channels and discard the others (EOG, EMG, temperature, respiration) to avoid resampling the other signals.

  • crop_wake_mins (float) – Number of minutes of wake time to keep before the first sleep event and after the last sleep event. Used to reduce the imbalance in this dataset. Default of 30 mins.

  • crop (None | tuple) – If not None crop the raw files (e.g. to use only the first 3h). Example: crop=(0, 3600*3) to keep only the first 3h.

class braindecode.datasets.SleepPhysionetChallenge2018(subject_ids='training', path=None, load_eeg_only=True, preproc=None, n_jobs=1)[source]#

Bases: BaseConcatDataset

Physionet Challenge 2018 polysomnography dataset.

Sleep dataset from https://physionet.org/content/challenge-2018/1.0.0/. Contains overnight recordings from 1983 healthy subjects.

The total size is 266 GB, so make sure you have enough space before downloading.

See fetch_pc18_data for a more complete description.

Parameters:
  • subject_ids (list(int) | str | None) – (list of) int of subject(s) to be loaded. - If None, loads all subjects (both training and test sets [no label associated]). - If “training”, loads only the training set subjects. - If “test”, loads only the test set subjects, no label associated! - Otherwise, expects an iterable of subject IDs.

  • path (None | str) – Location of where to look for the PC18 data storing location. If None, the environment variable or config parameter MNE_DATASETS_PC18_PATH is used. If it doesn’t exist, the “~/mne_data” directory is used. If the dataset is not found under the given path, the data will be automatically downloaded to the specified folder.

  • load_eeg_only (bool) – If True, only load the EEG channels and discard the others (EOG, EMG, temperature, respiration) to avoid resampling the other signals.

  • preproc (list(Preprocessor) | None) – List of preprocessors to apply to each file individually. This way the data can e.g., be downsampled (temporally and spatially) to limit the memory usage of the entire Dataset object. This also enables applying preprocessing in parallel over the recordings.

  • n_jobs (int) – Number of parallel processes.

class braindecode.datasets.TUH(path: str, recording_ids: list[int] | None = None, target_name: str | tuple[str, ...] | None = None, preload: bool = False, add_physician_reports: bool = False, rename_channels: bool = False, set_montage: bool = False, n_jobs: int = 1)[source]#

Bases: BaseConcatDataset

Temple University Hospital (TUH) EEG Corpus (www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml#c_tueg).

Parameters:
  • path (str) – Parent directory of the dataset.

  • recording_ids (list(int) | int) – A (list of) int of recording id(s) to be read (order matters and will overwrite default chronological order, e.g. if recording_ids=[1,0], then the first recording returned by this class will be chronologically later then the second recording. Provide recording_ids in ascending order to preserve chronological order.).

  • target_name (str) – Can be ‘gender’, or ‘age’.

  • preload (bool) – If True, preload the data of the Raw objects.

  • add_physician_reports (bool) – If True, the physician reports will be read from disk and added to the description.

  • rename_channels (bool) – If True, rename the EEG channels to the standard 10-05 system.

  • set_montage (bool) – If True, set the montage to the standard 10-05 system.

  • n_jobs (int) – Number of jobs to be used to read files in parallel.

class braindecode.datasets.TUHAbnormal(path: str, recording_ids: list[int] | None = None, target_name: str | tuple[str, ...] | None = 'pathological', preload: bool = False, add_physician_reports: bool = False, rename_channels: bool = False, set_montage: bool = False, n_jobs: int = 1)[source]#

Bases: TUH

Temple University Hospital (TUH) Abnormal EEG Corpus. see www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml#c_tuab

Parameters:
  • path (str) – Parent directory of the dataset.

  • recording_ids (list(int) | int) – A (list of) int of recording id(s) to be read (order matters and will overwrite default chronological order, e.g. if recording_ids=[1,0], then the first recording returned by this class will be chronologically later then the second recording. Provide recording_ids in ascending order to preserve chronological order.).

  • target_name (str) – Can be ‘pathological’, ‘gender’, or ‘age’.

  • preload (bool) – If True, preload the data of the Raw objects.

  • add_physician_reports (bool) – If True, the physician reports will be read from disk and added to the description.

  • rename_channels (bool) – If True, rename the EEG channels to the standard 10-05 system.

  • set_montage (bool) – If True, set the montage to the standard 10-05 system.

  • n_jobs (int) – Number of jobs to be used to read files in parallel.

class braindecode.datasets.WindowsDataset(windows: BaseEpochs, description: dict | Series | None = None, transform: Callable | None = None, targets_from: str = 'metadata', last_target_only: bool = True)[source]#

Bases: BaseDataset

Returns windows from an mne.Epochs object along with a target.

Dataset which serves windows from an mne.Epochs object along with their target and additional information. The metadata attribute of the Epochs object must contain a column called target, which will be used to return the target that corresponds to a window. Additional columns i_window_in_trial, i_start_in_trial, i_stop_in_trial are also required to serve information about the windowing (e.g., useful for cropped training). See braindecode.datautil.windowers to directly create a WindowsDataset from a BaseDataset object.

Parameters:
  • windows (mne.Epochs) – Windows obtained through the application of a windower to a BaseDataset (see braindecode.datautil.windowers).

  • description (dict | pandas.Series | None) – Holds additional info about the windows.

  • transform (callable | None) – On-the-fly transform applied to a window before it is returned.

  • targets_from (str) – Defines whether targets will be extracted from mne.Epochs metadata or mne.Epochs misc channels (time series targets). It can be metadata (default) or channels.

property description: Series#
set_description(description: dict | Series, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.Series) – Description in the form key: value.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

property transform#
braindecode.datasets.create_from_X_y(X: ndarray[Any, dtype[_ScalarType_co]], y: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], drop_last_window: bool, sfreq: float, ch_names: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] = None, window_size_samples: int | None = None, window_stride_samples: int | None = None) BaseConcatDataset[source]#

Create a BaseConcatDataset of WindowsDatasets from X and y to be used for decoding with skorch and braindecode, where X is a list of pre-cut trials and y are corresponding targets.

Parameters:
  • X (array-like) – list of pre-cut trials as n_trials x n_channels x n_times

  • y (array-like) – targets corresponding to the trials

  • drop_last_window (bool) – whether or not have a last overlapping window, when windows/windows do not equally divide the continuous signal

  • sfreq (float) – Sampling frequency of signals.

  • ch_names (array-like) – Names of the channels.

  • window_size_samples (int) – window size

  • window_stride_samples (int) – stride between windows

Returns:

windows_datasets – X and y transformed to a dataset format that is compatible with skorch and braindecode

Return type:

BaseConcatDataset

braindecode.datasets.create_from_mne_epochs(list_of_epochs: list[BaseEpochs], window_size_samples: int, window_stride_samples: int, drop_last_window: bool) BaseConcatDataset[source]#

Create WindowsDatasets from mne.Epochs

Parameters:
  • list_of_epochs (array-like) – list of mne.Epochs

  • window_size_samples (int) – window size

  • window_stride_samples (int) – stride between windows

  • drop_last_window (bool) – whether or not have a last overlapping window, when windows do not equally divide the continuous signal

Returns:

windows_datasets – X and y transformed to a dataset format that is compatible with skorch and braindecode

Return type:

BaseConcatDataset

braindecode.datasets.create_from_mne_raw(raws: list[BaseRaw], trial_start_offset_samples: int, trial_stop_offset_samples: int, window_size_samples: int, window_stride_samples: int, drop_last_window: bool, descriptions: list[dict | Series] | None = None, mapping: dict[str, int] | None = None, preload: bool = False, drop_bad_windows: bool = True, accepted_bads_ratio: float = 0.0) BaseConcatDataset[source]#

Create WindowsDatasets from mne.RawArrays

Parameters:
  • raws (array-like) – list of mne.RawArrays

  • trial_start_offset_samples (int) – start offset from original trial onsets in samples

  • trial_stop_offset_samples (int) – stop offset from original trial stop in samples

  • window_size_samples (int) – window size

  • window_stride_samples (int) – stride between windows

  • drop_last_window (bool) – whether or not have a last overlapping window, when windows do not equally divide the continuous signal

  • descriptions (array-like) – list of dicts or pandas.Series with additional information about the raws

  • mapping (dict(str: int)) – mapping from event description to target value

  • preload (bool) – if True, preload the data of the Epochs objects.

  • drop_bad_windows (bool) – If True, call .drop_bad() on the resulting mne.Epochs object. This step allows identifying e.g., windows that fall outside of the continuous recording. It is suggested to run this step here as otherwise the BaseConcatDataset has to be updated as well.

  • accepted_bads_ratio (float, optional) – Acceptable proportion of trials withinconsistent length in a raw. If the number of trials whose length is exceeded by the window size is smaller than this, then only the corresponding trials are dropped, but the computation continues. Otherwise, an error is raised. Defaults to 0.0 (raise an error).

Returns:

windows_datasets – X and y transformed to a dataset format that is compatible with skorch and braindecode

Return type:

BaseConcatDataset

Submodules#

braindecode.datasets.base module#

Dataset classes.

class braindecode.datasets.base.BaseConcatDataset(list_of_ds: list[BaseDataset | BaseConcatDataset | WindowsDataset] | None = None, target_transform: Callable | None = None)[source]#

Bases: ConcatDataset

A base class for concatenated datasets.

Holds either mne.Raw or mne.Epoch in self.datasets and has a pandas DataFrame with additional description.

Parameters:
  • list_of_ds (list) – list of BaseDataset, BaseConcatDataset or WindowsDataset

  • target_transform (callable | None) – Optional function to call on targets before returning them.

property description: DataFrame#
get_metadata() DataFrame[source]#

Concatenate the metadata and description of the wrapped Epochs.

Returns:

metadata – DataFrame containing as many rows as there are windows in the BaseConcatDataset, with the metadata and description information for each window.

Return type:

pd.DataFrame

save(path: str, overwrite: bool = False, offset: int = 0)[source]#

Save datasets to files by creating one subdirectory for each dataset: path/

0/

0-raw.fif | 0-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)

1/

1-raw.fif | 1-epo.fif description.json raw_preproc_kwargs.json (if raws were preprocessed) window_kwargs.json (if this is a windowed dataset) window_preproc_kwargs.json (if windows were preprocessed) target_name.json (if target_name is not None and dataset is raw)

Parameters:
  • path (str) –

    Directory in which subdirectories are created to store

    -raw.fif | -epo.fif and .json files to.

  • overwrite (bool) – Whether to delete old subdirectories that will be saved to in this call.

  • offset (int) – If provided, the integer is added to the id of the dataset in the concat. This is useful in the setting of very large datasets, where one dataset has to be processed and saved at a time to account for its original position.

set_description(description: dict | DataFrame, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.DataFrame) – Description in the form key: value where the length of the value has to match the number of datasets.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

split(by: str | list[int] | list[list[int]] | dict[str, list[int]] | None = None, property: str | None = None, split_ids: list[int] | list[list[int]] | dict[str, list[int]] | None = None) dict[str, BaseConcatDataset][source]#

Split the dataset based on information listed in its description.

The format could be based on a DataFrame or based on indices.

Parameters:
  • by (str | list | dict) – If by is a string, splitting is performed based on the description DataFrame column with this name. If by is a (list of) list of integers, the position in the first list corresponds to the split id and the integers to the datapoints of that split. If a dict then each key will be used in the returned splits dict and each value should be a list of int.

  • property (str) – Some property which is listed in the info DataFrame.

  • split_ids (list | dict) – List of indices to be combined in a subset. It can be a list of int or a list of list of int.

Returns:

splits – A dictionary with the name of the split (a string) as key and the dataset as value.

Return type:

dict

property target_transform#
property transform#
class braindecode.datasets.base.BaseDataset(raw: BaseRaw, description: dict | Series | None = None, target_name: str | tuple[str, ...] | None = None, transform: Callable | None = None)[source]#

Bases: Dataset

Returns samples from an mne.io.Raw object along with a target.

Dataset which serves samples from an mne.io.Raw object along with a target. The target is unique for the dataset, and is obtained through the description attribute.

Parameters:
  • raw (mne.io.Raw) – Continuous data.

  • description (dict | pandas.Series | None) – Holds additional description about the continuous signal / subject.

  • target_name (str | tuple | None) – Name(s) of the index in description that should be used to provide the target (e.g., to be used in a prediction task later on).

  • transform (callable | None) – On-the-fly transform applied to the example before it is returned.

property description: Series#
set_description(description: dict | Series, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.Series) – Description in the form key: value.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

property transform#
class braindecode.datasets.base.EEGWindowsDataset(raw: BaseRaw | BaseEpochs, metadata: DataFrame, description: dict | Series | None = None, transform: Callable | None = None, targets_from: str = 'metadata', last_target_only: bool = True)[source]#

Bases: BaseDataset

Returns windows from an mne.Raw object, its window indices, along with a target.

Dataset which serves windows from an mne.Epochs object along with their target and additional information. The metadata attribute of the Epochs object must contain a column called target, which will be used to return the target that corresponds to a window. Additional columns i_window_in_trial, i_start_in_trial, i_stop_in_trial are also required to serve information about the windowing (e.g., useful for cropped training). See braindecode.datautil.windowers to directly create a WindowsDataset from a BaseDataset object.

Parameters:
  • windows (mne.Raw or mne.Epochs (Epochs is outdated)) – Windows obtained through the application of a windower to a BaseDataset (see braindecode.datautil.windowers).

  • description (dict | pandas.Series | None) – Holds additional info about the windows.

  • transform (callable | None) – On-the-fly transform applied to a window before it is returned.

  • targets_from (str) – Defines whether targets will be extracted from metadata or from misc channels (time series targets). It can be metadata (default) or channels.

  • last_target_only (bool) – If targets are obtained from misc channels whether all targets if the entire (compute) window will be returned or only the last target in the window.

  • metadata (pandas.DataFrame) – Dataframe with crop indices, so i_window_in_trial, i_start_in_trial, i_stop_in_trial as well as targets.

property description: Series#
set_description(description: dict | Series, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.Series) – Description in the form key: value.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

property transform#
class braindecode.datasets.base.WindowsDataset(windows: BaseEpochs, description: dict | Series | None = None, transform: Callable | None = None, targets_from: str = 'metadata', last_target_only: bool = True)[source]#

Bases: BaseDataset

Returns windows from an mne.Epochs object along with a target.

Dataset which serves windows from an mne.Epochs object along with their target and additional information. The metadata attribute of the Epochs object must contain a column called target, which will be used to return the target that corresponds to a window. Additional columns i_window_in_trial, i_start_in_trial, i_stop_in_trial are also required to serve information about the windowing (e.g., useful for cropped training). See braindecode.datautil.windowers to directly create a WindowsDataset from a BaseDataset object.

Parameters:
  • windows (mne.Epochs) – Windows obtained through the application of a windower to a BaseDataset (see braindecode.datautil.windowers).

  • description (dict | pandas.Series | None) – Holds additional info about the windows.

  • transform (callable | None) – On-the-fly transform applied to a window before it is returned.

  • targets_from (str) – Defines whether targets will be extracted from mne.Epochs metadata or mne.Epochs misc channels (time series targets). It can be metadata (default) or channels.

property description: Series#
set_description(description: dict | Series, overwrite: bool = False)[source]#

Update (add or overwrite) the dataset description.

Parameters:
  • description (dict | pd.Series) – Description in the form key: value.

  • overwrite (bool) – Has to be True if a key in description already exists in the dataset description.

property transform#

braindecode.datasets.bbci module#

class braindecode.datasets.bbci.BBCIDataset(filename: str, load_sensor_names: list[str] | None = None, check_class_names: bool = False)[source]#

Bases: object

BBCIDataset.

Loader class for files created by saving BBCI files in matlab (make sure to save with ‘-v7.3’ in matlab, see https://de.mathworks.com/help/matlab/import_export/mat-file-versions.html#buk6i87 )

Parameters:
  • filename (str)

  • load_sensor_names (list of str, optional) – Also speeds up loading if you only load some sensors. None means load all sensors.

  • check_class_names (bool, optional) – check if the class names are part of some known class names at Translational NeuroTechnology Lab, AG Ball, Freiburg, Germany.

static get_all_sensors(filename: str, pattern: str | None = None) list[str][source]#

Get all sensors that exist in the given file.

Parameters:
  • filename (str)

  • pattern (str, optional) – Only return those sensor names that match the given pattern.

Returns:

sensor_names – Sensor names that match the pattern or all sensor names in the file.

Return type:

list of str

load() RawArray[source]#
braindecode.datasets.bbci.load_bbci_sets_from_folder(folder: str, runs: list[int] | str = 'all') list[RawArray][source]#

Load bbci datasets from files in given folder.

Parameters:
  • folder (str) – Folder with .BBCI.mat files inside

  • runs (list of int) – If you only want to load specific runs. Assumes filenames with such kind of part: S001R02 for Run 2. Tries to match this regex: 'S[0-9]{3,3}R[0-9]{2,2}_'.

braindecode.datasets.bcicomp module#

class braindecode.datasets.bcicomp.BCICompetitionIVDataset4(subject_ids: list[int] | int | None = None)[source]#

Bases: BaseConcatDataset

BCI competition IV dataset 4.

Contains ECoG recordings for three patients moving fingers during the experiment. Targets correspond to the time courses of the flexion of each of five fingers. See http://www.bbci.de/competition/iv/desc_4.pdf and http://www.bbci.de/competition/iv/ for the dataset and competition description. ECoG library containing the dataset: https://searchworks.stanford.edu/view/zk881ps0522

Notes

When using this dataset please cite [1] .

Parameters:

subject_ids (list(int) | int | None) – (list of) int of subject(s) to be loaded. If None, load all available subjects. Should be in range 1-3.

References

[1]

Miller, Kai J. “A library of human electrocorticographic data and analyses.” Nature human behaviour 3, no. 11 (2019): 1225-1235. https://doi.org/10.1038/s41562-019-0678-3

cumulative_sizes: list[int]#
datasets: list[Dataset[_T_co]]#
static download(path=None, force_update=False, verbose=None)[source]#

Download the dataset.

Parameters:
  • location. (path (None | str) – Location of where to look for the data storing)

  • None (or None) – If not)

  • parameter (the environment variable or config)

  • exist (MNE_DATASETS_(dataset)_PATH is used. If it doesn’t)

  • “~/mne_data” (the)

  • path (directory is used. If the dataset is not found under the given)

  • data (the)

  • folder. (will be automatically downloaded to the specified)

  • exists. (force_update (bool) – Force update of the dataset even if a local copy)

  • (bool (verbose)

  • str

  • int

  • None

  • level (override default verbose)

  • mne.verbose()) ((see)

possible_subjects = [1, 2, 3]#

braindecode.datasets.bids module#

Dataset for loading BIDS.

More information on BIDS (Brain Imaging Data Structure) can be found at https://bids.neuroimaging.io

class braindecode.datasets.bids.BIDSDataset(root: ~pathlib.Path | str, subjects: str | list[str] | None = None, sessions: str | list[str] | None = None, tasks: str | list[str] | None = None, acquisitions: str | list[str] | None = None, runs: str | list[str] | None = None, processings: str | list[str] | None = None, recordings: str | list[str] | None = None, spaces: str | list[str] | None = None, splits: str | list[str] | None = None, descriptions: str | list[str] | None = None, suffixes: str | list[str] | None = None, extensions: str | list[str] | None = <factory>, datatypes: str | list[str] | None = None, check: bool = False, preload: bool = False, n_jobs: int = 1)[source]#

Bases: BaseConcatDataset

Dataset for loading BIDS.

This class has the same parameters as the mne_bids.find_matching_paths() function as it will be used to find the files to load. The default extensions parameter was changed.

More information on BIDS (Brain Imaging Data Structure) can be found at https://bids.neuroimaging.io

Note

For loading “unofficial” BIDS datasets containing epoched data, you can use BIDSEpochsDataset.

Parameters:
  • root (pathlib.Path | str) – The root of the BIDS path.

  • subjects (str | array-like of str | None) – The subject ID. Corresponds to “sub”.

  • sessions (str | array-like of str | None) – The acquisition session. Corresponds to “ses”.

  • tasks (str | array-like of str | None) – The experimental task. Corresponds to “task”.

  • acquisitions (str | array-like of str | None) – The acquisition parameters. Corresponds to “acq”.

  • runs (str | array-like of str | None) – The run number. Corresponds to “run”.

  • processings (str | array-like of str | None) – The processing label. Corresponds to “proc”.

  • recordings (str | array-like of str | None) – The recording name. Corresponds to “rec”.

  • spaces (str | array-like of str | None) – The coordinate space for anatomical and sensor location files (e.g., *_electrodes.tsv, *_markers.mrk). Corresponds to “space”. Note that valid values for space must come from a list of BIDS keywords as described in the BIDS specification.

  • splits (str | array-like of str | None) – The split of the continuous recording file for .fif data. Corresponds to “split”.

  • descriptions (str | array-like of str | None) – This corresponds to the BIDS entity desc. It is used to provide additional information for derivative data, e.g., preprocessed data may be assigned description='cleaned'.

  • suffixes (str | array-like of str | None) – The filename suffix. This is the entity after the last _ before the extension. E.g., 'channels'. The following filename suffix’s are accepted: ‘meg’, ‘markers’, ‘eeg’, ‘ieeg’, ‘T1w’, ‘participants’, ‘scans’, ‘electrodes’, ‘coordsystem’, ‘channels’, ‘events’, ‘headshape’, ‘digitizer’, ‘beh’, ‘physio’, ‘stim’

  • extensions (str | array-like of str | None) – The extension of the filename. E.g., '.json'. By default, uses the ones accepted by mne_bids.read_raw_bids().

  • datatypes (str | array-like of str | None) – The BIDS data type, e.g., 'anat', 'func', 'eeg', 'meg', 'ieeg'.

  • check (bool) – If True, only returns paths that conform to BIDS. If False (default), the .check attribute of the returned mne_bids.BIDSPath object will be set to True for paths that do conform to BIDS, and to False for those that don’t.

  • preload (bool) – If True, preload the data. Defaults to False.

  • n_jobs (int) – Number of jobs to run in parallel. Defaults to 1.

acquisitions: str | list[str] | None = None#
check: bool = False#
cumulative_sizes: list[int]#
datasets: list[Dataset[_T_co]]#
datatypes: str | list[str] | None = None#
descriptions: str | list[str] | None = None#
extensions: str | list[str] | None#
n_jobs: int = 1#
preload: bool = False#
processings: str | list[str] | None = None#
recordings: str | list[str] | None = None#
root: Path | str#
runs: str | list[str] | None = None#
sessions: str | list[str] | None = None#
spaces: str | list[str] | None = None#
splits: str | list[str] | None = None#
subjects: str | list[str] | None = None#
suffixes: str | list[str] | None = None#
tasks: str | list[str] | None = None#
class braindecode.datasets.bids.BIDSEpochsDataset(*args, **kwargs)[source]#

Bases: BIDSDataset

Experimental dataset for loading mne.Epochs organised in BIDS.

The files must end with _epo.fif.

Warning

Epoched data is not officially supported in BIDS.

Note

Parameters: This class has the same parameters as BIDSDataset except for arguments datatypes, extensions and check which are fixed.

braindecode.datasets.mne module#

braindecode.datasets.mne.create_from_mne_epochs(list_of_epochs: list[BaseEpochs], window_size_samples: int, window_stride_samples: int, drop_last_window: bool) BaseConcatDataset[source]#

Create WindowsDatasets from mne.Epochs

Parameters:
  • list_of_epochs (array-like) – list of mne.Epochs

  • window_size_samples (int) – window size

  • window_stride_samples (int) – stride between windows

  • drop_last_window (bool) – whether or not have a last overlapping window, when windows do not equally divide the continuous signal

Returns:

windows_datasets – X and y transformed to a dataset format that is compatible with skorch and braindecode

Return type:

BaseConcatDataset

braindecode.datasets.mne.create_from_mne_raw(raws: list[BaseRaw], trial_start_offset_samples: int, trial_stop_offset_samples: int, window_size_samples: int, window_stride_samples: int, drop_last_window: bool, descriptions: list[dict | Series] | None = None, mapping: dict[str, int] | None = None, preload: bool = False, drop_bad_windows: bool = True, accepted_bads_ratio: float = 0.0) BaseConcatDataset[source]#

Create WindowsDatasets from mne.RawArrays

Parameters:
  • raws (array-like) – list of mne.RawArrays

  • trial_start_offset_samples (int) – start offset from original trial onsets in samples

  • trial_stop_offset_samples (int) – stop offset from original trial stop in samples

  • window_size_samples (int) – window size

  • window_stride_samples (int) – stride between windows

  • drop_last_window (bool) – whether or not have a last overlapping window, when windows do not equally divide the continuous signal

  • descriptions (array-like) – list of dicts or pandas.Series with additional information about the raws

  • mapping (dict(str: int)) – mapping from event description to target value

  • preload (bool) – if True, preload the data of the Epochs objects.

  • drop_bad_windows (bool) – If True, call .drop_bad() on the resulting mne.Epochs object. This step allows identifying e.g., windows that fall outside of the continuous recording. It is suggested to run this step here as otherwise the BaseConcatDataset has to be updated as well.

  • accepted_bads_ratio (float, optional) – Acceptable proportion of trials withinconsistent length in a raw. If the number of trials whose length is exceeded by the window size is smaller than this, then only the corresponding trials are dropped, but the computation continues. Otherwise, an error is raised. Defaults to 0.0 (raise an error).

Returns:

windows_datasets – X and y transformed to a dataset format that is compatible with skorch and braindecode

Return type:

BaseConcatDataset

braindecode.datasets.moabb module#

Dataset objects for some public datasets.

class braindecode.datasets.moabb.BNCI2014001(subject_ids)[source]#

Bases: MOABBDataset

BNCI 2014-001 Motor Imagery dataset.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/bnci2014-001-moabb-1

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

9

22

4

144

4

250

2

6

62208

Dataset IIa from BCI Competition 4 [Ra334fadbbade-1].

Dataset Description

This data set consists of EEG data from 9 subjects. The cue-based BCI paradigm consisted of four different motor imagery tasks, namely the imag- ination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and tongue (class 4). Two sessions on different days were recorded for each subject. Each session is comprised of 6 runs separated by short breaks. One run consists of 48 trials (12 for each of the four possible classes), yielding a total of 288 trials per session.

The subjects were sitting in a comfortable armchair in front of a computer screen. At the beginning of a trial ( t = 0 s), a fixation cross appeared on the black screen. In addition, a short acoustic warning tone was presented. After two seconds ( t = 2 s), a cue in the form of an arrow pointing either to the left, right, down or up (corresponding to one of the four classes left hand, right hand, foot or tongue) appeared and stayed on the screen for 1.25 s. This prompted the subjects to perform the desired motor imagery task. No feedback was provided. The subjects were ask to carry out the motor imagery task until the fixation cross disappeared from the screen at t = 6 s.

Twenty-two Ag/AgCl electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG; the montage is shown in Figure 3 left. All signals were recorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The signals were sampled with. 250 Hz and bandpass-filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 μV . An additional 50 Hz notch filter was enabled to suppress line noise

Parameters:
subject_ids: list(int) | int | None

(list of) int of subject(s) to be fetched. If None, data of all subjects is fetched.

See moabb.datasets.bnci.BNCI2014001

class BNCI2014001(*args, **kwargs)[source]#

Bases: BNCI2014_001

BNCI 2014-001 Motor Imagery dataset.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/bnci2014-001-moabb-1

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

9

22

4

144

4

250

2

6

62208

Dataset IIa from BCI Competition 4 [1].

Dataset Description

This data set consists of EEG data from 9 subjects. The cue-based BCI paradigm consisted of four different motor imagery tasks, namely the imag- ination of movement of the left hand (class 1), right hand (class 2), both feet (class 3), and tongue (class 4). Two sessions on different days were recorded for each subject. Each session is comprised of 6 runs separated by short breaks. One run consists of 48 trials (12 for each of the four possible classes), yielding a total of 288 trials per session.

The subjects were sitting in a comfortable armchair in front of a computer screen. At the beginning of a trial ( t = 0 s), a fixation cross appeared on the black screen. In addition, a short acoustic warning tone was presented. After two seconds ( t = 2 s), a cue in the form of an arrow pointing either to the left, right, down or up (corresponding to one of the four classes left hand, right hand, foot or tongue) appeared and stayed on the screen for 1.25 s. This prompted the subjects to perform the desired motor imagery task. No feedback was provided. The subjects were ask to carry out the motor imagery task until the fixation cross disappeared from the screen at t = 6 s.

Twenty-two Ag/AgCl electrodes (with inter-electrode distances of 3.5 cm) were used to record the EEG; the montage is shown in Figure 3 left. All signals were recorded monopolarly with the left mastoid serving as reference and the right mastoid as ground. The signals were sampled with. 250 Hz and bandpass-filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 μV . An additional 50 Hz notch filter was enabled to suppress line noise

References

[1]

Tangermann, M., Müller, K.R., Aertsen, A., Birbaumer, N., Braun, C., Brunner, C., Leeb, R., Mehring, C., Miller, K.J., Mueller-Putz, G. and Nolte, G., 2012. Review of the BCI competition IV. Frontiers in neuroscience, 6, p.55.

cumulative_sizes: list[int]#
datasets: list[Dataset[_T_co]]#
doc = 'See moabb.datasets.bnci.BNCI2014001\n\n    Parameters\n    ----------\n    subject_ids: list(int) | int | None\n        (list of) int of subject(s) to be fetched. If None, data of all\n        subjects is fetched.\n    '#
class braindecode.datasets.moabb.HGD(subject_ids)[source]#

Bases: MOABBDataset

High-gamma dataset described in Schirrmeister et al. 2017.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/schirrmeister2017-moabb

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

14

128

4

120

4

500

1

2

13440

Dataset from [Ra4397400a5be-1]

Our “High-Gamma Dataset” is a 128-electrode dataset (of which we later only use 44 sensors covering the motor cortex, (see Section 2.7.1), obtained from 14 healthy subjects (6 female, 2 left-handed, age 27.2 ± 3.6 (mean ± std)) with roughly 1000 (963.1 ± 150.9, mean ± std) four-second trials of executed movements divided into 13 runs per subject. The four classes of movements were movements of either the left hand, the right hand, both feet, and rest (no movement, but same type of visual cue as for the other classes). The training set consists of the approx. 880 trials of all runs except the last two runs, the test set of the approx. 160 trials of the last 2 runs. This dataset was acquired in an EEG lab optimized for non-invasive detection of high- frequency movement-related EEG components (Ball et al., 2008; Darvas et al., 2010).

Depending on the direction of a gray arrow that was shown on black back- ground, the subjects had to repetitively clench their toes (downward arrow), perform sequential finger-tapping of their left (leftward arrow) or right (rightward arrow) hand, or relax (upward arrow). The movements were selected to require little proximal muscular activity while still being complex enough to keep subjects in- volved. Within the 4-s trials, the subjects performed the repetitive movements at their own pace, which had to be maintained as long as the arrow was showing. Per run, 80 arrows were displayed for 4 s each, with 3 to 4 s of continuous random inter-trial interval. The order of presentation was pseudo-randomized, with all four arrows being shown every four trials. Ideally 13 runs were performed to collect 260 trials of each movement and rest. The stimuli were presented and the data recorded with BCI2000 (Schalk et al., 2004). The experiment was approved by the ethical committee of the University of Freiburg.

Parameters:
subject_ids: list(int) | int | None

(list of) int of subject(s) to be fetched. If None, data of all subjects is fetched.

See moabb.datasets.schirrmeister2017.Schirrmeister2017

class Schirrmeister2017[source]#

Bases: BaseDataset

High-gamma dataset described in Schirrmeister et al. 2017.

PapersWithCode leaderboard: https://paperswithcode.com/dataset/schirrmeister2017-moabb

Dataset summary

#Subj

#Chan

#Classes

#Trials / class

Trial length(s)

Freq(Hz)

#Session

#Runs

Total_trials

14

128

4

120

4

500

1

2

13440

Dataset from [1]

Our “High-Gamma Dataset” is a 128-electrode dataset (of which we later only use 44 sensors covering the motor cortex, (see Section 2.7.1), obtained from 14 healthy subjects (6 female, 2 left-handed, age 27.2 ± 3.6 (mean ± std)) with roughly 1000 (963.1 ± 150.9, mean ± std) four-second trials of executed movements divided into 13 runs per subject. The four classes of movements were movements of either the left hand, the right hand, both feet, and rest (no movement, but same type of visual cue as for the other classes). The training set consists of the approx. 880 trials of all runs except the last two runs, the test set of the approx. 160 trials of the last 2 runs. This dataset was acquired in an EEG lab optimized for non-invasive detection of high- frequency movement-related EEG components (Ball et al., 2008; Darvas et al., 2010).

Depending on the direction of a gray arrow that was shown on black back- ground, the subjects had to repetitively clench their toes (downward arrow), perform sequential finger-tapping of their left (leftward arrow) or right (rightward arrow) hand, or relax (upward arrow). The movements were selected to require little proximal muscular activity while still being complex enough to keep subjects in- volved. Within the 4-s trials, the subjects performed the repetitive movements at their own pace, which had to be maintained as long as the arrow was showing. Per run, 80 arrows were displayed for 4 s each, with 3 to 4 s of continuous random inter-trial interval. The order of presentation was pseudo-randomized, with all four arrows being shown every four trials. Ideally 13 runs were performed to collect 260 trials of each movement and rest. The stimuli were presented and the data recorded with BCI2000 (Schalk et al., 2004). The experiment was approved by the ethical committee of the University of Freiburg.

References

[1]

Schirrmeister, Robin Tibor, et al. “Deep learning with convolutional neural networks for EEG decoding and visualization.” Human brain mapping 38.11 (2017): 5391-5420.

data_path(subject, path=None, force_update=False, update_path=None, verbose=None)[source]#

Get path to local copy of a subject data.

Parameters:
  • subject (int) – Number of subject to use

  • path (None | str) – Location of where to look for the data storing location. If None, the environment variable or config parameter MNE_DATASETS_(dataset)_PATH is used. If it doesn’t exist, the “~/mne_data” directory is used. If the dataset is not found under the given path, the data will be automatically downloaded to the specified folder.

  • force_update (bool) – Force update of the dataset even if a local copy exists.

  • update_path (bool | None Deprecated) – If True, set the MNE_DATASETS_(dataset)_PATH in mne-python config to the given path. If None, the user is prompted.

  • verbose (bool, str, int, or None) – If not None, override default verbose level (see mne.verbose()).

Returns:

path – Local path to the given data file. This path is contained inside a list of length one, for compatibility.

Return type:

list of str

cumulative_sizes: list[int]#
datasets: list[Dataset[_T_co]]#
doc = 'See moabb.datasets.schirrmeister2017.Schirrmeister2017\n\n    Parameters\n    ----------\n    subject_ids: list(int) | int | None\n        (list of) int of subject(s) to be fetched. If None, data of all\n        subjects is fetched.\n    '#
class braindecode.datasets.moabb.MOABBDataset(dataset_name: str, subject_ids: list[int] | int | None = None, dataset_kwargs: dict[str, Any] | None = None, dataset_load_kwargs: dict[str, Any] | None = None)[source]#

Bases: BaseConcatDataset

A class for moabb datasets.

Parameters:
  • dataset_name (str) – name of dataset included in moabb to be fetched

  • subject_ids (list(int) | int | None) – (list of) int of subject(s) to be fetched. If None, data of all subjects is fetched.

  • dataset_kwargs (dict, optional) – optional dictionary containing keyword arguments to pass to the moabb dataset when instantiating it.

  • dataset_load_kwargs (dict, optional) – optional dictionary containing keyword arguments to pass to the moabb dataset’s load_data method. Allows using the moabb cache_config=None and process_pipeline=None.

braindecode.datasets.moabb.fetch_data_with_moabb(dataset_name: str, subject_ids: list[int] | int | None = None, dataset_kwargs: dict[str, Any] | None = None, dataset_load_kwargs: dict[str, Any] | None = None) tuple[list[Raw], DataFrame][source]#

Fetch data using moabb.

Parameters:
  • dataset_name (str | moabb.datasets.base.BaseDataset) – the name of a dataset included in moabb

  • subject_ids (list(int) | int) – (list of) int of subject(s) to be fetched

  • dataset_kwargs (dict, optional) – optional dictionary containing keyword arguments to pass to the moabb dataset when instantiating it.

  • data_load_kwargs (dict, optional) – optional dictionary containing keyword arguments to pass to the moabb dataset’s load_data method. Allows using the moabb cache_config=None and process_pipeline=None.

Returns:

  • raws (mne.Raw)

  • info (pandas.DataFrame)

braindecode.datasets.nmt module#

Dataset classes for the NMT EEG Corpus dataset.

The NMT Scalp EEG Dataset is an open-source annotated dataset of healthy and pathological EEG recordings for predictive modeling. This dataset contains 2,417 recordings from unique participants spanning almost 625 h.

Note

  • The signal unit may not be uV and further examination is required.

  • The spectrum shows that the signal may have been band-pass filtered from about 2 - 33Hz,

which needs to be further determined.

class braindecode.datasets.nmt.NMT(path=None, target_name='pathological', recording_ids=None, preload=False, n_jobs=1)[source]#

Bases: BaseConcatDataset

The NMT Scalp EEG Dataset.

An Open-Source Annotated Dataset of Healthy and Pathological EEG Recordings for Predictive Modeling.

This dataset contains 2,417 recordings from unique participants spanning almost 625 h.

Here, the dataset can be used for three tasks, brain-age, gender prediction, abnormality detection.

The dataset is described in [Khan2022].

Added in version 0.9.

Parameters:
  • path (str) – Parent directory of the dataset.

  • recording_ids (list(int) | int) – A (list of) int of recording id(s) to be read (order matters and will overwrite default chronological order, e.g. if recording_ids=[1,0], then the first recording returned by this class will be chronologically later than the second recording. Provide recording_ids in ascending order to preserve chronological order.).

  • target_name (str) – Can be “pathological”, “gender”, or “age”.

  • preload (bool) – If True, preload the data of the Raw objects.

References

[Khan2022]

Khan, H.A.,Ul Ain, R., Kamboh, A.M., Butt, H.T.,Shafait,S., Alamgir, W., Stricker, D. and Shafait, F., 2022. The NMT scalp EEG dataset: an open-source annotated dataset of healthy and pathological EEG recordings for predictive modeling. Frontiers in neuroscience, 15, p.755817.

braindecode.datasets.sleep_physio_challe_18 module#

PhysioNet Challenge 2018 dataset.

class braindecode.datasets.sleep_physio_challe_18.SleepPhysionetChallenge2018(subject_ids='training', path=None, load_eeg_only=True, preproc=None, n_jobs=1)[source]#

Bases: BaseConcatDataset

Physionet Challenge 2018 polysomnography dataset.

Sleep dataset from https://physionet.org/content/challenge-2018/1.0.0/. Contains overnight recordings from 1983 healthy subjects.

The total size is 266 GB, so make sure you have enough space before downloading.

See fetch_pc18_data for a more complete description.

Parameters:
  • subject_ids (list(int) | str | None) – (list of) int of subject(s) to be loaded. - If None, loads all subjects (both training and test sets [no label associated]). - If “training”, loads only the training set subjects. - If “test”, loads only the test set subjects, no label associated! - Otherwise, expects an iterable of subject IDs.

  • path (None | str) – Location of where to look for the PC18 data storing location. If None, the environment variable or config parameter MNE_DATASETS_PC18_PATH is used. If it doesn’t exist, the “~/mne_data” directory is used. If the dataset is not found under the given path, the data will be automatically downloaded to the specified folder.

  • load_eeg_only (bool) – If True, only load the EEG channels and discard the others (EOG, EMG, temperature, respiration) to avoid resampling the other signals.

  • preproc (list(Preprocessor) | None) – List of preprocessors to apply to each file individually. This way the data can e.g., be downsampled (temporally and spatially) to limit the memory usage of the entire Dataset object. This also enables applying preprocessing in parallel over the recordings.

  • n_jobs (int) – Number of parallel processes.

braindecode.datasets.sleep_physio_challe_18.ensure_metafiles_exist()[source]#
braindecode.datasets.sleep_physio_challe_18.fetch_pc18_data(subjects, path=None, force_update=False, base_url='https://physionet.org/files/challenge-2018/1.0.0/')[source]#

Get paths to local copies of PhysioNet Challenge 2018 dataset files.

This will fetch data from the publicly available PhysioNet Computing in Cardiology Challenge 2018 dataset on sleep arousal detection [1] [2]. This corresponds to 1983 recordings from individual subjects with (suspected) sleep apnea. The dataset is separated into a training set with 994 recordings for which arousal annotation are available and a test set with 989 recordings for which the labels have not been revealed. Across the entire dataset, mean age is 55 years old and 65% of recordings are from male subjects.

More information can be found on the physionet website.

Parameters:
  • subjects (list of int) – The subjects to use. Can be in the range of 0-1982 (inclusive). Test recordings are 0-988, while training recordings are 989-1982.

  • path (None | str) – Location of where to look for the PC18 data storing location. If None, the environment variable or config parameter PC18_DATASET_PATH is used. If it doesn’t exist, the “~/mne_data” directory is used. If the dataset is not found under the given path, the data will be automatically downloaded to the specified folder.

  • force_update (bool) – Force update of the dataset even if a local copy exists.

  • update_path (bool | None) – If True, set the PC18_DATASET_PATH in mne-python config to the given path. If None, the user is prompted.

  • base_url (str) – The URL root.

  • %(verbose)s

Returns:

paths – List of local data paths of the given type.

Return type:

list

References

[1]

Mohammad M Ghassemi, Benjamin E Moody, Li-wei H Lehman, Christopher Song, Qiao Li, Haoqi Sun, Roger G Mark, M Brandon Westover, Gari D Clifford. You Snooze, You Win: the PhysioNet/Computing in Cardiology Challenge 2018.

[2]

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., … & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.)

braindecode.datasets.sleep_physionet module#

class braindecode.datasets.sleep_physionet.SleepPhysionet(subject_ids: list[int] | int | None = None, recording_ids: list[int] | None = None, preload=False, load_eeg_only=True, crop_wake_mins=30, crop=None)[source]#

Bases: BaseConcatDataset

Sleep Physionet dataset.

Sleep dataset from https://physionet.org/content/sleep-edfx/1.0.0/. Contains overnight recordings from 78 healthy subjects.

See MNE example <https://mne.tools/stable/auto_tutorials/clinical/60_sleep.html>.

Parameters:
  • subject_ids (list(int) | int | None) – (list of) int of subject(s) to be loaded. If None, load all available subjects.

  • recording_ids (list(int) | None) – Recordings to load per subject (each subject except 13 has two recordings). Can be [1], [2] or [1, 2] (same as None).

  • preload (bool) – If True, preload the data of the Raw objects.

  • load_eeg_only (bool) – If True, only load the EEG channels and discard the others (EOG, EMG, temperature, respiration) to avoid resampling the other signals.

  • crop_wake_mins (float) – Number of minutes of wake time to keep before the first sleep event and after the last sleep event. Used to reduce the imbalance in this dataset. Default of 30 mins.

  • crop (None | tuple) – If not None crop the raw files (e.g. to use only the first 3h). Example: crop=(0, 3600*3) to keep only the first 3h.

braindecode.datasets.tuh module#

Dataset classes for the Temple University Hospital (TUH) EEG Corpus and the TUH Abnormal EEG Corpus.

class braindecode.datasets.tuh.TUH(path: str, recording_ids: list[int] | None = None, target_name: str | tuple[str, ...] | None = None, preload: bool = False, add_physician_reports: bool = False, rename_channels: bool = False, set_montage: bool = False, n_jobs: int = 1)[source]#

Bases: BaseConcatDataset

Temple University Hospital (TUH) EEG Corpus (www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml#c_tueg).

Parameters:
  • path (str) – Parent directory of the dataset.

  • recording_ids (list(int) | int) – A (list of) int of recording id(s) to be read (order matters and will overwrite default chronological order, e.g. if recording_ids=[1,0], then the first recording returned by this class will be chronologically later then the second recording. Provide recording_ids in ascending order to preserve chronological order.).

  • target_name (str) – Can be ‘gender’, or ‘age’.

  • preload (bool) – If True, preload the data of the Raw objects.

  • add_physician_reports (bool) – If True, the physician reports will be read from disk and added to the description.

  • rename_channels (bool) – If True, rename the EEG channels to the standard 10-05 system.

  • set_montage (bool) – If True, set the montage to the standard 10-05 system.

  • n_jobs (int) – Number of jobs to be used to read files in parallel.

class braindecode.datasets.tuh.TUHAbnormal(path: str, recording_ids: list[int] | None = None, target_name: str | tuple[str, ...] | None = 'pathological', preload: bool = False, add_physician_reports: bool = False, rename_channels: bool = False, set_montage: bool = False, n_jobs: int = 1)[source]#

Bases: TUH

Temple University Hospital (TUH) Abnormal EEG Corpus. see www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml#c_tuab

Parameters:
  • path (str) – Parent directory of the dataset.

  • recording_ids (list(int) | int) – A (list of) int of recording id(s) to be read (order matters and will overwrite default chronological order, e.g. if recording_ids=[1,0], then the first recording returned by this class will be chronologically later then the second recording. Provide recording_ids in ascending order to preserve chronological order.).

  • target_name (str) – Can be ‘pathological’, ‘gender’, or ‘age’.

  • preload (bool) – If True, preload the data of the Raw objects.

  • add_physician_reports (bool) – If True, the physician reports will be read from disk and added to the description.

  • rename_channels (bool) – If True, rename the EEG channels to the standard 10-05 system.

  • set_montage (bool) – If True, set the montage to the standard 10-05 system.

  • n_jobs (int) – Number of jobs to be used to read files in parallel.

braindecode.datasets.xy module#

braindecode.datasets.xy.create_from_X_y(X: ndarray[Any, dtype[_ScalarType_co]], y: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], drop_last_window: bool, sfreq: float, ch_names: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] = None, window_size_samples: int | None = None, window_stride_samples: int | None = None) BaseConcatDataset[source]#

Create a BaseConcatDataset of WindowsDatasets from X and y to be used for decoding with skorch and braindecode, where X is a list of pre-cut trials and y are corresponding targets.

Parameters:
  • X (array-like) – list of pre-cut trials as n_trials x n_channels x n_times

  • y (array-like) – targets corresponding to the trials

  • drop_last_window (bool) – whether or not have a last overlapping window, when windows/windows do not equally divide the continuous signal

  • sfreq (float) – Sampling frequency of signals.

  • ch_names (array-like) – Names of the channels.

  • window_size_samples (int) – window size

  • window_stride_samples (int) – stride between windows

Returns:

windows_datasets – X and y transformed to a dataset format that is compatible with skorch and braindecode

Return type:

BaseConcatDataset