braindecode.models.CodeBrain#

class braindecode.models.CodeBrain(n_outputs=None, n_chans=None, chs_info=None, n_times=None, input_window_seconds=None, sfreq=None, patch_size=200, res_channels=200, skip_channels=200, out_channels=200, num_res_layers=8, drop_prob=0.1, s4_bidirectional=True, s4_layernorm=False, s4_lmax=570, s4_d_state=64, conv_out_chans=25, conv_groups=5, proj_kernel_size=49, proj_padding=24, proj_refine_kernel=3, pos_kernel=(19, 7), spectral_dropout=0.1, mlp_hidden_multiplier=4, swa_window_size=1, codebook_size_t=4096, codebook_size_f=4096, pretrain_mode=False, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]#

CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks.

Foundation Model Attention/Transformer

CodeBrain pre-training overview

CodeBrain is a foundation model for EEG that pre-trains on large unlabelled corpora using a two-stage vector-quantised masking strategy, then fine-tunes on downstream BCI tasks. It segments EEG signals into fixed-size patches, embeds them with convolutional and spectral projections, and processes them through stacked residual blocks that combine a multi-scale convolutional structured state-space model (_GConv) with sliding-window self-attention.

Stage 2: EEGSSM Backbone (this implementation)

This class implements Stage 2 of CodeBrain — the EEGSSM backbone described in Section 3.3 of [codebrain]. Following Labram, CodeBrain discretises EEG patches into codebook tokens via VQ-VAE (Stage 1, not implemented here), then trains the backbone to predict masked token indices via cross-entropy. CodeBrain extends this with a dual tokenizer that decouples temporal and frequency representations, as stated in the paper: “the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens to enhance discriminative power.”

Macro Components

  • PatchEmbedding: Splits (batch, n_chans, n_times) into (batch, n_chans, seq_len, patch_size) patches, projects each patch with a 2-D convolutional stack, adds FFT-based spectral embeddings, and applies depth-wise convolutional positional encoding.

  • Residual blocks (ResidualGroup): Each block applies RMSNorm, a _GConv SSM layer, and sliding-window multi-head attention, with gated activation and separate residual/skip paths.

  • Classification head (final_layer): Flattens the output and maps to n_outputs classes.

Important

Pre-trained Weights Available

This model has pre-trained weights available on the Hugging Face Hub. You can load them using:

from braindecode.models import CodeBrain

# Load pre-trained model from Hugging Face Hub
model = CodeBrain.from_pretrained("braindecode/codebrain-pretrained")

To push your own trained model to the Hub:

model.push_to_hub("my-username/my-codebrain")
Parameters:
  • n_outputs (int) – Number of outputs of the model. This is the number of classes in the case of classification.

  • n_chans (int) – Number of EEG channels.

  • chs_info (list of dict) – Information about each individual EEG channel. This should be filled with info["chs"]. Refer to mne.Info for more details.

  • n_times (int) – Number of time samples of the input window.

  • input_window_seconds (float) – Length of the input window in seconds.

  • sfreq (float) – Sampling frequency of the EEG recordings.

  • patch_size (int) – Number of time samples per patch. Input length is trimmed to the nearest multiple of patch_size.

  • res_channels (int) – Width of the residual stream inside each ResidualBlock.

  • skip_channels (int) – Width of the skip-connection stream aggregated across blocks.

  • out_channels (int) – Output channels of final_conv before the classification head.

  • num_res_layers (int) – Number of stacked ResidualBlock modules.

  • drop_prob (float) – Dropout rate used inside the _GConv SSM and attention layers.

  • s4_bidirectional (bool) – Whether the _GConv SSM processes the sequence bidirectionally.

  • s4_layernorm (bool) – Whether to apply layer normalisation inside the _GConv SSM. Set to False to match the released pretrained checkpoint.

  • s4_lmax (int) – Maximum sequence length for the _GConv SSM kernel. Also determines the patch embedding dimension as s4_lmax // n_chans.

  • s4_d_state (int) – State dimension of the _GConv SSM.

  • conv_out_chans (int) – Number of output channels in the patch projection convolutions.

  • conv_groups (int) – Number of groups for GroupNorm in the patch projection.

  • proj_kernel_size (int) – The description is missing.

  • proj_padding (int) – The description is missing.

  • proj_refine_kernel (int) – The description is missing.

  • pos_kernel (tuple) – The description is missing.

  • spectral_dropout (float) – The description is missing.

  • mlp_hidden_multiplier (int) – The description is missing.

  • swa_window_size (int) – The description is missing.

  • codebook_size_t (int) – The description is missing.

  • codebook_size_f (int) – The description is missing.

  • pretrain_mode (bool) – The description is missing.

  • activation (type[Module]) – Non-linear activation class used in init_conv and final_conv.

Raises:

ValueError – If some input signal-related parameters are not specified: and can not be inferred.

Notes

If some input signal-related parameters are not specified, there will be an attempt to infer them from the other parameters.

References

[codebrain]

Yi Ding, Xuyang Chen, Yong Li, Rui Yan, Tao Wang, Le Wu (2025). CodeBrain: Scalable Code EEG Pre-Training for Unified Downstream BCI Tasks. https://arxiv.org/abs/2506.09110

Hugging Face Hub integration

When the optional huggingface_hub package is installed, all models automatically gain the ability to be pushed to and loaded from the Hugging Face Hub. Install with:

pip install braindecode[hub]

Pushing a model to the Hub:

from braindecode.models import CodeBrain

# Train your model
model = CodeBrain(n_chans=22, n_outputs=4, n_times=1000)
# ... training code ...

# Push to the Hub
model.push_to_hub(
    repo_id="username/my-codebrain-model",
    commit_message="Initial model upload",
)

Loading a model from the Hub:

from braindecode.models import CodeBrain

# Load pretrained model
model = CodeBrain.from_pretrained("username/my-codebrain-model")

# Load with a different number of outputs (head is rebuilt automatically)
model = CodeBrain.from_pretrained("username/my-codebrain-model", n_outputs=4)

Extracting features and replacing the head:

import torch

x = torch.randn(1, model.n_chans, model.n_times)
# Extract encoder features (consistent dict across all models)
out = model(x, return_features=True)
features = out["features"]

# Replace the classification head
model.reset_head(n_outputs=10)

Saving and restoring full configuration:

import json

config = model.get_config()            # all __init__ params
with open("config.json", "w") as f:
    json.dump(config, f)

model2 = CodeBrain.from_config(config)    # reconstruct (no weights)

All model parameters (both EEG-specific and model-specific such as dropout rates, activation functions, number of filters) are automatically saved to the Hub and restored when loading.

See Loading and Adapting Pretrained Foundation Models for a complete tutorial.

Methods

forward(inputs, mask=None, return_features=False)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:
  • inputs – The description is missing.

  • mask – The description is missing.

  • return_features – The description is missing.

load_state_dict(state_dict, *args, **kwargs)[source]#

Remap upstream checkpoint keys to braindecode attribute names.

Handles four kinds of key differences from the original CodeBrain checkpoint (YjMajy/CodeBrain on HuggingFace):

  1. module. prefix from DataParallel saving.

  2. Attribute renames: S41 -> sgconv, sn -> rms_norm, .D -> .skip_weight.

  3. KernelModule wrapper: .kernel_list.N.kernel -> .kernel_list.N.

  4. Old weight_norm API: weight_g/weight_v -> parametrizations.weight.original0/original1.

Parameters:
  • state_dict – The description is missing.

  • *args – The description is missing.

  • **kwargs – The description is missing.

Examples using braindecode.models.CodeBrain#

Loading and Adapting Pretrained Foundation Models

Loading and Adapting Pretrained Foundation Models