braindecode.models.REVE#

class braindecode.models.REVE(n_outputs=None, n_chans=None, chs_info=None, n_times=None, input_window_seconds=None, sfreq=None, embed_dim=512, depth=22, heads=8, head_dim=64, mlp_dim_ratio=2.66, use_geglu=True, freqs=4, patch_size=200, patch_overlap=20, attention_pooling=False)[source]#

Representation for EEG with Versatile Embeddings (REVE) from El Ouahidi et al. (2025) [reve].

Foundation Model Attention/Transformer

REVE Training pipeline overview

Foundation models have transformed machine learning by reducing reliance on task-specific data and induced biases through large-scale pretraining. While successful in language and vision, their adoption in EEG has lagged due to the Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup and resulting in suboptimal performance, particularly under linear probing.

REVE is a pretrained model explicitly designed to generalize across diverse EEG signals. It introduces a 4D positional encoding scheme that enables processing signals of arbitrary length and electrode arrangement. Using a masked autoencoding objective, REVE was pretrained on over 60,000 hours of EEG data from 92 datasets spanning 25,000 subjects, the largest EEG pretraining effort to date.

Channels Invariant Positional Encoding

Prior EEG foundation models (Labram, BIOT) rely on fixed positional embeddings, making direct transfer to unseen electrode layouts infeasible. CBraMod uses convolution-based positional encoding that requires fine-tuning when adapting to new configurations. As noted in the CBraMod paper: *”fixing the pre-trained parameters during training on downstream datasets will lead to a very large performance decline.

REVE’s 4D positional encoding jointly encodes spatial \((x, y, z)\) and temporal \((t)\) positions using Fourier embeddings, enabling true cross-configuration transfer without retraining. The fourier embedding have inspiration on brainmodule [brainmodule], generalized to 4D for EEG with the channel spatial coordinates and temporal patch index.

Linear Probing Performance

A key advantage of REVE is producing useful latent representation without heavy fine-tuning. Under linear probing (frozen encoder), This enables practical deployment in low-data scenarios where extensive fine-tuning is not feasible.

Architecture

The model adopts modern Transformer components validated through ablation studies:

  • Normalization: RMSNorm outperforms LayerNorm;

  • Activation: GEGLU outperforms GELU;

  • Attention: Flash Attention via PyTorch’s SDPA;

  • Masking ratio: 55% optimal for spatio-temporal block masking

These choices align with best practices from large language models and were empirically validated on EEG data.

Secondary Loss

A secondary reconstruction objective using attention pooling across layers prevents over-specialization in the final layer. This pooling acts as an information bottleneck, forcing the model to distill key information from the entire sequence. Ablations show this loss is crucial for linear probing quality: removing it drops average performance in 10% under the frozen evaluation.

Macro Components

  • REVE.to_patch_embedding Patch Tokenization

    The EEG signal is split into overlapping patches along the time dimension, generating \(p = \left\lceil \frac{T - w}{w - o} \right\rceil + \mathbf{1}[(T - w) \bmod (w - o) \neq 0]\) patches of size \(w\) with overlap \(o\), where \(T\) is the signal length. Each patch is linearly projected to the embedding dimension.

  • REVE.fourier4d + REVE.mlp4d 4D Positional Embedding (4DPE)

    The 4DPE encodes each token’s 4D coordinates \((x, y, z, t)\) where \((x, y, z)\) are the 3D spatial coordinates from a standardized electrode position bank, and \(t\) is the temporal patch index. The encoding combines:

    1. Fourier embedding: Sinusoidal encoding across multiple frequencies for smooth interpolation to unseen positions

    2. MLP embedding: Linear (4 → embed_dim) → GELULayerNorm for learnable refinement

    Both components are summed and normalized. The 4DPE adds negligible computational overhead, scaling linearly with the number of tokens.

  • REVE.transformer Transformer Encoder

    Pre-LayerNorm Transformer with multi-head self-attention (RMSNorm), feed-forward networks (GEGLU activation), and residual connections. Default configuration: 22 layers, 8 heads, 512 embedding dimension (~72M parameters).

  • REVE.final_layer Classification Head

    Two modes (controlled by the attention_pooling parameter):

    • When attention_pooling is disabled (e.g., None or False): flatten all tokens → LayerNormLinear

    • When attention_pooling is enabled: attention pooling with a learnable query token attending to all encoder outputs

Known Limitations

  • Sparse electrode setups: Performance degrades with very few channels. On motor imagery, accuracy drops from 0.824 (64 channels) to 0.660 (1 channel). For tasks requiring broad spatial coverage (e.g., imagined speech), performance with <4 channels approaches chance level.

  • Demographic bias: The pretraining corpus aggregates publicly available datasets, most originating from North America and Europe, resulting in limited demographic diversity, more details about the datasets used for pretraining can be found in the REVE paper [reve].

Pretrained Weights

Weights are available on HuggingFace, but you must agree to the data usage terms before downloading:

  • brain-bzh/reve-base: 72M parameters, 512 embedding dim, 22 layers (~260 A100 GPU hours)

  • brain-bzh/reve-large: ~400M parameters, 1250 embedding dim

Important

Pre-trained Weights Available

This model has pre-trained weights available on the Hugging Face Hub. You can load them using:

from braindecode.models import REVE

# Load pre-trained model from Hugging Face Hub
model = REVE.from_pretrained("brain-bzh/reve-base")

To push your own trained model to the Hub:

# After training your model
model.push_to_hub(
    repo_id="username/my-reve-model", commit_message="Upload trained REVE model"
)

Requires installing braindecode[hug] for Hub integration.

Usage

from braindecode.models import REVE

model = REVE(
    n_outputs=4,  # e.g., 4-class motor imagery
    n_chans=22,
    n_times=1000,  # 5 seconds at 200 Hz
    sfreq=200,
    chs_info=[{"ch_name": "C3"}, {"ch_name": "C4"}, ...],
)

# Forward pass: (batch, n_chans, n_times) -> (batch, n_outputs)
output = model(eeg_data, pos=channel_positions)

Warning

Input data must be sampled at 200 Hz to match pretraining. The model applies z-score normalization followed by clipping at 15 standard deviations internally during pretraining-users should apply similar preprocessing.

Parameters:
  • embed_dim (int, default=512) – Embedding dimension. Use 512 for REVE-Base, 1250 for REVE-Large.

  • depth (int, default=22) – Number of Transformer layers.

  • heads (int, default=8) – Number of attention heads.

  • head_dim (int, default=64) – Dimension per attention head.

  • mlp_dim_ratio (float, default=2.66) – FFN hidden dimension ratio: mlp_dim = embed_dim × mlp_dim_ratio.

  • use_geglu (bool, default=True) – Use GEGLU activation (recommended) or standard GELU.

  • freqs (int, default=4) – Number of frequencies for Fourier positional embedding.

  • patch_size (int, default=200) – Temporal patch size in samples (200 samples = 1 second at 200 Hz).

  • patch_overlap (int, default=20) – Overlap between patches in samples.

  • attention_pooling (bool, default=False) – Pooling strategy for aggregating transformer outputs before classification. If False (default), all tokens are flattened into a single vector of size (n_chans x n_patches x embed_dim), which is then passed through LayerNorm and a linear classifier. If True, uses attention-based pooling with a learnable query token that attends to all encoder outputs, producing a single embedding of size embed_dim. Attention pooling is more parameter-efficient for long sequences and variable-length inputs.

References

[reve] (1,2)

El Ouahidi, Y., Lys, J., Thölke, P., Farrugia, N., Pasdeloup, B., Gripon, V., Jerbi, K. & Lioi, G. (2025). REVE: A Foundation Model for EEG - Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects. The Thirty-Ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=ZeFMtRBy4Z

[brainmodule]

Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J. R. (2023). Decoding speech perception from non-invasive brain recordings. Nature Machine Intelligence, 5(10), 1097-1107.

Notes

The position bank is downloaded from HuggingFace on first initialization, mapping standard 10-20/10-10/10-05 electrode names to 3D coordinates. This enables the 4D positional encoding to generalize across electrode configurations without requiring matched layouts between pretraining and downstream tasks.

Methods

forward(eeg, pos=None, return_output=False)[source]#

Forward pass of the model.

Goes through the following steps: 1. Patch extraction from the EEG signal. 2. 4D positional embedding computation. 3. Transformer encoding. 4. Final layer processing (if return_output is False).

Parameters:
  • eeg (torch.Tensor) – Input EEG tensor of shape (batch_size, channels, sequence_length).

  • pos (torch.Tensor) – Position tensor of shape (batch_size, channels, 3) representing (x, y, z) coordinates.

  • return_output (bool, optional) – If True, returns the output from the transformer directly. If False, applies the final layer and returns the processed output. Default is False.

Returns:

  • If return_output is False: Returns a single torch.Tensor (output after final layer).

  • If return_output is True: Returns a list[torch.Tensor] (outputs from transformer layers).

The output tensor(s) from the model. If return_output is True, returns the transformer output; otherwise, returns the output after the final layer.

Return type:

Union[torch.Tensor, list[torch.Tensor]]

get_positions(channel_names)[source]#

Fetch channel positions from the position bank. The position bank is downloaded when the model is instantiated.

Parameters:

channel_names (list[str]) – List of channel names for which to fetch positions.

Returns:

Tensor of shape (num_channels, 3) containing the (x, y, z) positions of the channels.

Return type:

torch.Tensor