braindecode.functional.rescale_parameter#
- braindecode.functional.rescale_parameter(param, layer_id)[source]#
Recaling the l-th transformer layer.
Rescales the parameter tensor by the inverse square root of the layer id. Made inplace. \(\frac{1}{\sqrt{2 \cdot \text{layer\_id}}}\) [Beit2022]
In the labram, this is used to rescale the output matrices (i.e., the last linear projection within each sub-layer) of the self-attention module.
- Parameters:
param (
torch.Tensor
) – tensor to be rescaledlayer_id (int) – layer id in the neural network
References
[Beit2022] Hangbo Bao, Li Dong, Songhao Piao, Furu We (2022). BEIT: BERT Pre-Training of Image Transformers.