braindecode.datasets.bids.HubDatasetMixin#
- class braindecode.datasets.bids.HubDatasetMixin[source]#
Mixin class for Hugging Face Hub integration with EEG datasets.
This class adds push_to_hub() and pull_from_hub() methods to BaseConcatDataset, enabling easy upload and download of datasets to/from the Hugging Face Hub.
Examples
>>> # Push dataset to Hub >>> dataset = NMT(path=path, preload=True) >>> dataset.push_to_hub( ... repo_id="username/nmt-dataset", ... commit_message="Add NMT dataset" ... ) >>> >>> # Load dataset from Hub >>> dataset = BaseConcatDataset.pull_from_hub("username/nmt-dataset")
Methods
- classmethod pull_from_hub(repo_id, preload=True, token=None, cache_dir=None, force_download=False, **kwargs)[source]#
Load a dataset from the Hugging Face Hub.
- Parameters:
repo_id (str) – Repository ID on the Hugging Face Hub (e.g., “username/dataset-name”).
preload (bool, default=True) – Whether to preload the data into memory. If False, uses lazy loading (when supported by the format).
token (str | None) – Hugging Face API token. If None, uses cached token.
cache_dir (str | Path | None) – Directory to cache the downloaded dataset. If None, uses default cache directory (~/.cache/huggingface/datasets).
force_download (bool, default=False) – Whether to force re-download even if cached.
**kwargs – Additional arguments (currently unused).
- Returns:
The loaded dataset.
- Return type:
- Raises:
ImportError – If huggingface-hub is not installed.
FileNotFoundError – If the repository or dataset files are not found.
Examples
>>> from braindecode.datasets import BaseConcatDataset >>> dataset = BaseConcatDataset.pull_from_hub("username/nmt-dataset") >>> print(f"Loaded {len(dataset)} windows") >>> >>> # Use with PyTorch >>> from torch.utils.data import DataLoader >>> loader = DataLoader(dataset, batch_size=32, shuffle=True)
- push_to_hub(repo_id, commit_message=None, private=False, token=None, create_pr=False, compression='blosc', compression_level=5, pipeline_name='braindecode')[source]#
Upload the dataset to the Hugging Face Hub in BIDS-like Zarr format.
The dataset is converted to Zarr format with blosc compression, which provides optimal random access performance for PyTorch training. The data is stored in a BIDS sourcedata-like structure with events.tsv, channels.tsv, and participants.tsv sidecar files.
- Parameters:
repo_id (str) – Repository ID on the Hugging Face Hub (e.g., “username/dataset-name”).
commit_message (str | None) – Commit message. If None, a default message is generated.
private (bool, default=False) – Whether to create a private repository.
token (str | None) – Hugging Face API token. If None, uses cached token.
create_pr (bool, default=False) – Whether to create a Pull Request instead of directly committing.
compression (str, default="blosc") – Compression algorithm for Zarr. Options: “blosc”, “zstd”, “gzip”, None.
compression_level (int, default=5) – Compression level (0-9). Level 5 provides optimal balance.
pipeline_name (str, default="braindecode") – Name of the processing pipeline for BIDS sourcedata.
- Returns:
URL of the uploaded dataset on the Hub.
- Return type:
- Raises:
ImportError – If huggingface-hub is not installed.
ValueError – If the dataset is empty or format is invalid.
Examples
>>> dataset = NMT(path=path, preload=True) >>> # Upload with BIDS-like structure >>> url = dataset.push_to_hub( ... repo_id="myusername/nmt-dataset", ... commit_message="Upload NMT EEG dataset" ... )
Examples using braindecode.datasets.bids.HubDatasetMixin#
Cleaning EEG Data with EEGPrep for Trialwise Decoding
Comprehensive Preprocessing with MNE-based Classes
Convolutional neural network regression model on fake data.
Benchmarking preprocessing with parallelization and serialization
Uploading and downloading datasets to Hugging Face Hub
Searching the best data augmentation on BCIC IV 2a Dataset
Self-supervised learning on EEG with relative positioning
Sleep staging on the Sleep Physionet dataset using Chambon2018 network
Sleep staging on the Sleep Physionet dataset using Eldele2021
Sleep staging on the Sleep Physionet dataset using U-Sleep network