braindecode.datasets.bids.HubDatasetMixin#

class braindecode.datasets.bids.HubDatasetMixin[source]#

Mixin class for Hugging Face Hub integration with EEG datasets.

This class adds push_to_hub() and pull_from_hub() methods to BaseConcatDataset, enabling easy upload and download of datasets to/from the Hugging Face Hub.

Examples

>>> # Push dataset to Hub
>>> dataset = NMT(path=path, preload=True)
>>> dataset.push_to_hub(
...     repo_id="username/nmt-dataset",
... )
>>>
>>> # Load dataset from Hub
>>> dataset = BaseConcatDataset.pull_from_hub("username/nmt-dataset")

Methods

classmethod pull_from_hub(repo_id, preload=True, token=None, cache_dir=None, force_download=False, **kwargs)[source]#

Load a dataset from the Hugging Face Hub.

Parameters:
  • repo_id (str) – Repository ID on the Hugging Face Hub (e.g., “username/dataset-name”).

  • preload (bool, default=True) – Whether to preload the data into memory. If False, uses lazy loading (when supported by the format).

  • token (str | None) – Hugging Face API token. If None, uses cached token.

  • cache_dir (str | Path | None) – Directory to cache the downloaded dataset. If None, uses default cache directory (~/.cache/huggingface/datasets).

  • force_download (bool, default=False) – Whether to force re-download even if cached.

  • **kwargs – Additional arguments (currently unused).

Returns:

The loaded dataset.

Return type:

BaseConcatDataset

Raises:

Examples

>>> from braindecode.datasets import BaseConcatDataset
>>> dataset = BaseConcatDataset.pull_from_hub("username/nmt-dataset")
>>> print(f"Loaded {len(dataset)} windows")
>>>
>>> # Use with PyTorch
>>> from torch.utils.data import DataLoader
>>> loader = DataLoader(dataset, batch_size=32, shuffle=True)
push_to_hub(repo_id, private=False, token=None, compression='blosc', compression_level=5, pipeline_name='braindecode', chunk_size=5000000, local_cache_dir=None, **kwargs)[source]#

Upload the dataset to the Hugging Face Hub in BIDS-like Zarr format.

The dataset is converted to Zarr format with blosc compression, which provides optimal random access performance for PyTorch training. The data is stored in a BIDS sourcedata-like structure with events.tsv, channels.tsv, and participants.tsv sidecar files.

Parameters:
  • repo_id (str) – Repository ID on the Hugging Face Hub (e.g., “username/dataset-name”).

  • private (bool, default=False) – Whether to create a private repository.

  • token (str | None) – Hugging Face API token. If None, uses cached token.

  • compression (str, default="blosc") – Compression algorithm for Zarr. Options: “blosc”, “zstd”, “gzip”, None.

  • compression_level (int, default=5) – Compression level (0-9). Level 5 provides optimal balance.

  • pipeline_name (str, default="braindecode") – Name of the processing pipeline for BIDS sourcedata.

  • chunk_size (int, default=5_000_000) – Number of samples per chunk in Zarr along the time/window dimension. Larger chunk sizes create fewer but larger chunks/files. This parameter is used for both continuous data (e.g., RawDataset, EEGWindowsDataset) and pre-cut windows (WindowsDataset). For WindowsDataset, multiple windows may be stored in a single chunk depending on their duration and the chosen chunk_size.

  • local_cache_dir (str | Path | None) –

    Local directory to use for temporary files during upload. If None, uses the system temp directory and cleans it up after upload. If provided, the directory is used as a persistent cache:

    • If the directory is empty (or does not exist), the cache is built there and a lock file (format_info.json) is written once the cache is complete, before the upload starts. The file contains the zarr conversion parameters as JSON.

    • If the lock file is present and its JSON parameters match the current call, cache creation is skipped and the upload resumes directly (useful for retrying interrupted uploads).

    • If the lock file is present but its JSON parameters differ from the current call, a ValueError is raised.

    • If the directory is non-empty but the lock file is absent, a ValueError is raised listing the files found.

  • **kwargs – Additional arguments passed to huggingface_hub.upload_large_folder().

Returns:

URL of the uploaded dataset on the Hub.

Return type:

str

Raises:
  • ImportError – If huggingface-hub is not installed.

  • ValueError – If the dataset is empty or format is invalid.

Examples

>>> dataset = NMT(path=path, preload=True)
>>> # Upload with BIDS-like structure
>>> url = dataset.push_to_hub(
...     repo_id="myusername/nmt-dataset",
... )

Examples using braindecode.datasets.bids.HubDatasetMixin#

Cleaning EEG Data with EEGPrep for Trialwise Decoding

Cleaning EEG Data with EEGPrep for Trialwise Decoding