braindecode.datasets.bids.HubDatasetMixin#

class braindecode.datasets.bids.HubDatasetMixin[source]#

Mixin class for Hugging Face Hub integration with EEG datasets.

This class adds push_to_hub() and pull_from_hub() methods to BaseConcatDataset, enabling easy upload and download of datasets to/from the Hugging Face Hub.

Examples

>>> # Push dataset to Hub
>>> dataset = NMT(path=path, preload=True)
>>> dataset.push_to_hub(
...     repo_id="username/nmt-dataset",
...     commit_message="Add NMT dataset"
... )
>>>
>>> # Load dataset from Hub
>>> dataset = BaseConcatDataset.pull_from_hub("username/nmt-dataset")

Methods

classmethod pull_from_hub(repo_id, preload=True, token=None, cache_dir=None, force_download=False, **kwargs)[source]#

Load a dataset from the Hugging Face Hub.

Parameters:
  • repo_id (str) – Repository ID on the Hugging Face Hub (e.g., “username/dataset-name”).

  • preload (bool, default=True) – Whether to preload the data into memory. If False, uses lazy loading (when supported by the format).

  • token (str | None) – Hugging Face API token. If None, uses cached token.

  • cache_dir (str | Path | None) – Directory to cache the downloaded dataset. If None, uses default cache directory (~/.cache/huggingface/datasets).

  • force_download (bool, default=False) – Whether to force re-download even if cached.

  • **kwargs – Additional arguments (currently unused).

Returns:

The loaded dataset.

Return type:

BaseConcatDataset

Raises:

Examples

>>> from braindecode.datasets import BaseConcatDataset
>>> dataset = BaseConcatDataset.pull_from_hub("username/nmt-dataset")
>>> print(f"Loaded {len(dataset)} windows")
>>>
>>> # Use with PyTorch
>>> from torch.utils.data import DataLoader
>>> loader = DataLoader(dataset, batch_size=32, shuffle=True)
push_to_hub(repo_id, commit_message=None, private=False, token=None, create_pr=False, compression='blosc', compression_level=5, pipeline_name='braindecode')[source]#

Upload the dataset to the Hugging Face Hub in BIDS-like Zarr format.

The dataset is converted to Zarr format with blosc compression, which provides optimal random access performance for PyTorch training. The data is stored in a BIDS sourcedata-like structure with events.tsv, channels.tsv, and participants.tsv sidecar files.

Parameters:
  • repo_id (str) – Repository ID on the Hugging Face Hub (e.g., “username/dataset-name”).

  • commit_message (str | None) – Commit message. If None, a default message is generated.

  • private (bool, default=False) – Whether to create a private repository.

  • token (str | None) – Hugging Face API token. If None, uses cached token.

  • create_pr (bool, default=False) – Whether to create a Pull Request instead of directly committing.

  • compression (str, default="blosc") – Compression algorithm for Zarr. Options: “blosc”, “zstd”, “gzip”, None.

  • compression_level (int, default=5) – Compression level (0-9). Level 5 provides optimal balance.

  • pipeline_name (str, default="braindecode") – Name of the processing pipeline for BIDS sourcedata.

Returns:

URL of the uploaded dataset on the Hub.

Return type:

str

Raises:
  • ImportError – If huggingface-hub is not installed.

  • ValueError – If the dataset is empty or format is invalid.

Examples

>>> dataset = NMT(path=path, preload=True)
>>> # Upload with BIDS-like structure
>>> url = dataset.push_to_hub(
...     repo_id="myusername/nmt-dataset",
...     commit_message="Upload NMT EEG dataset"
... )

Examples using braindecode.datasets.bids.HubDatasetMixin#

Cleaning EEG Data with EEGPrep for Trialwise Decoding

Cleaning EEG Data with EEGPrep for Trialwise Decoding

Cropped Decoding on BCIC IV 2a Dataset

Cropped Decoding on BCIC IV 2a Dataset

Basic Brain Decoding on EEG Data

Basic Brain Decoding on EEG Data

How to train, test and tune your model?

How to train, test and tune your model?

Hyperparameter tuning with scikit-learn

Hyperparameter tuning with scikit-learn

Comprehensive Preprocessing with MNE-based Classes

Comprehensive Preprocessing with MNE-based Classes

Convolutional neural network regression model on fake data.

Convolutional neural network regression model on fake data.

Training a Braindecode model in PyTorch

Training a Braindecode model in PyTorch

Benchmarking preprocessing with parallelization and serialization

Benchmarking preprocessing with parallelization and serialization

BIDS Dataset Example

BIDS Dataset Example

Uploading and downloading datasets to Hugging Face Hub

Uploading and downloading datasets to Hugging Face Hub

Load and save dataset example

Load and save dataset example

MOABB Dataset Example

MOABB Dataset Example

Split Dataset Example

Split Dataset Example

Multiple discrete targets with the TUH EEG Corpus

Multiple discrete targets with the TUH EEG Corpus

Data Augmentation on BCIC IV 2a Dataset

Data Augmentation on BCIC IV 2a Dataset

Searching the best data augmentation on BCIC IV 2a Dataset

Searching the best data augmentation on BCIC IV 2a Dataset

Experiment configuration with Pydantic and Exca

Experiment configuration with Pydantic and Exca

Fine-tuning a Foundation Model (Signal-JEPA)

Fine-tuning a Foundation Model (Signal-JEPA)

Self-supervised learning on EEG with relative positioning

Self-supervised learning on EEG with relative positioning

Sleep staging on the Sleep Physionet dataset using Chambon2018 network

Sleep staging on the Sleep Physionet dataset using Chambon2018 network

Sleep staging on the Sleep Physionet dataset using Eldele2021

Sleep staging on the Sleep Physionet dataset using Eldele2021

Sleep staging on the Sleep Physionet dataset using U-Sleep network

Sleep staging on the Sleep Physionet dataset using U-Sleep network

Process a big data EEG resource (TUH EEG Corpus)

Process a big data EEG resource (TUH EEG Corpus)