.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/benchmark_lazy_eager_loading.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_benchmark_lazy_eager_loading.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_benchmark_lazy_eager_loading.py:

Benchmarking eager and lazy loading
======================================

In this example, we compare the execution time and memory requirements of 1)
eager loading, i.e., preloading the entire data into memory and 2) lazy loading,
i.e., only loading examples from disk when they are required. We also include
some other experiment parameters in the comparison for the sake of completeness
(e.g., `num_workers`, `cuda`, `batch_size`, etc.).

While eager loading might be required for some preprocessing steps that require
continuous data (e.g., temporal filtering, resampling), it also allows
fast access to the data during training. However, this might come at the expense
of large memory usage, and can ultimately become impossible if the dataset does
not fit into memory (e.g., the TUH EEG dataset's >1,5 TB of recordings will
not fit in the memory of most machines).

Lazy loading avoids this potential memory issue by loading examples from disk
when they are required. This means large datasets can be used for training,
however this introduces some file-reading overhead every time an example must
be extracted. Some preprocessing steps that require continuous data also have to
be implemented differently to accomodate the nature of windowed data. Overall
though, we can reduce the impact of lazy loading by using the `num_workers`
parameter of pytorch's `Dataloader` class, which dispatches the data loading to
multiple processes.

To enable lazy loading in braindecode, data files must be saved in an
MNE-compatible format (e.g., 'fif', 'edf', etc.), and the `Dataset` object must
have been instantiated with parameter `preload=False`.

.. GENERATED FROM PYTHON SOURCE LINES 30-54

.. code-block:: default


    # Authors: Hubert Banville <hubert.jbanville@gmail.com>
    #
    # License: BSD (3-clause)

    from itertools import product
    import time

    import torch
    from torch import nn, optim
    from torch.utils.data import DataLoader

    import mne
    import numpy as np
    import pandas as pd
    import seaborn as sns

    from braindecode.datasets import TUHAbnormal
    from braindecode.preprocessing import create_fixed_length_windows
    from braindecode.models import ShallowFBCSPNet, Deep4Net


    mne.set_log_level('WARNING')  # avoid messages everytime a window is extracted


.. GENERATED FROM PYTHON SOURCE LINES 55-57

We start by setting two pytorch internal parameters that can affect the
comparison::

.. GENERATED FROM PYTHON SOURCE LINES 57-62

.. code-block:: default

    N_JOBS = 8
    torch.backends.cudnn.benchmark = True  # Enables automatic algorithm optimizations
    torch.set_num_threads(N_JOBS)  # Sets the available number of threads



.. GENERATED FROM PYTHON SOURCE LINES 63-70

Next, we define a few functions to automate the benchmarking.
For the purpose of this example, we load some recordings from the TUH Abnormal
corpus, extract sliding windows, and bundle them in a braindecode Dataset.
We then train a neural network for a few epochs.

Each one of these steps will be timed, so we can report the total time taken
to prepare the data and train the model.

.. GENERATED FROM PYTHON SOURCE LINES 70-215

.. code-block:: default


    def load_example_data(preload, window_len_s, n_subjects=10):
        """Create windowed dataset from subjects of the TUH Abnormal dataset.

        Parameters
        ----------
        preload: bool
            If True, use eager loading, otherwise use lazy loading.
        n_subjects: int
            Number of subjects to load.

        Returns
        -------
        windows_ds: BaseConcatDataset
            Windowed data.

        .. warning::
            The recordings from the TUH Abnormal corpus do not all share the same
            sampling rate. The following assumes that the files have already been
            resampled to a common sampling rate.
        """
        subject_ids = list(range(n_subjects))
        ds = TUHAbnormal(
            TUH_PATH, subject_ids=subject_ids, target_name='pathological',
            preload=preload)

        fs = ds.datasets[0].raw.info['sfreq']
        window_len_samples = int(fs * window_len_s)
        window_stride_samples = int(fs * 4)
        # window_stride_samples = int(fs * window_len_s)
        windows_ds = create_fixed_length_windows(
            ds, start_offset_samples=0, stop_offset_samples=None,
            window_size_samples=window_len_samples,
            window_stride_samples=window_stride_samples, drop_last_window=True,
            preload=preload, drop_bad_windows=True)

        # Drop bad epochs
        # XXX: This could be parallelized.
        # XXX: Also, this could be implemented in the Dataset object itself.
        for ds in windows_ds.datasets:
            ds.windows.drop_bad()
            assert ds.windows.preload == preload

        return windows_ds


    def create_example_model(n_channels, n_classes, window_len_samples,
                             kind='shallow', cuda=False):
        """Create model, loss and optimizer.

        Parameters
        ----------
        n_channels : int
            Number of channels in the input
        n_times : int
            Window length in the input
        n_classes : int
            Number of classes in the output
        kind : str
            'shallow' or 'deep'
        cuda : bool
            If True, move the model to a CUDA device.

        Returns
        -------
        model : torch.nn.Module
            Model to train.
        loss :
            Loss function
        optimizer :
            Optimizer
        """
        if kind == 'shallow':
            model = ShallowFBCSPNet(
                n_channels, n_classes, input_window_samples=window_len_samples,
                n_filters_time=40, filter_time_length=25, n_filters_spat=40,
                pool_time_length=75, pool_time_stride=15, final_conv_length='auto',
                split_first_layer=True, batch_norm=True, batch_norm_alpha=0.1,
                drop_prob=0.5)
        elif kind == 'deep':
            model = Deep4Net(
                n_channels, n_classes, input_window_samples=window_len_samples,
                final_conv_length='auto', n_filters_time=25, n_filters_spat=25,
                filter_time_length=10, pool_time_length=3, pool_time_stride=3,
                n_filters_2=50, filter_length_2=10, n_filters_3=100,
                filter_length_3=10, n_filters_4=200, filter_length_4=10,
                first_pool_mode="max", later_pool_mode="max", drop_prob=0.5,
                double_time_convs=False, split_first_layer=True, batch_norm=True,
                batch_norm_alpha=0.1, stride_before_pool=False)
        else:
            raise ValueError

        if cuda:
            model.cuda()

        optimizer = optim.Adam(model.parameters())
        loss = nn.NLLLoss()

        return model, loss, optimizer


    def run_training(model, dataloader, loss, optimizer, n_epochs=1, cuda=False):
        """Run training loop.

        Parameters
        ----------
        model : torch.nn.Module
            Model to train.
        dataloader : torch.utils.data.Dataloader
            Data loader which will serve examples to the model during training.
        loss :
            Loss function.
        optimizer :
            Optimizer.
        n_epochs : int
            Number of epochs to train the model for.
        cuda : bool
            If True, move X and y to CUDA device.

        Returns
        -------
        model : torch.nn.Module
            Trained model.
        """
        for i in range(n_epochs):
            loss_vals = list()
            for X, y, _ in dataloader:
                model.train()
                model.zero_grad()

                y = y.long()
                if cuda:
                    X, y = X.cuda(), y.cuda()

                loss_val = loss(model(X), y)
                loss_vals.append(loss_val.item())

                loss_val.backward()
                optimizer.step()

            print(f'Epoch {i + 1} - mean training loss: {np.mean(loss_vals)}')

        return model



.. GENERATED FROM PYTHON SOURCE LINES 216-217

Next, we define the different hyperparameters that we want to compare:

.. GENERATED FROM PYTHON SOURCE LINES 217-231

.. code-block:: default


    PRELOAD = [True, False]  # True -> eager loading; False -> lazy loading
    N_SUBJECTS = [10]  # Number of recordings to load from the TUH Abnormal corpus
    WINDOW_LEN_S = [2, 4, 15]  # Window length, in seconds
    N_EPOCHS = [2]  # Number of epochs to train the model for
    BATCH_SIZE = [64, 256]  # Training minibatch size
    MODEL = ['shallow', 'deep']

    NUM_WORKERS = [8, 0]  # number of processes used by pytorch's Dataloader
    PIN_MEMORY = [False]  # whether to use pinned memory
    CUDA = [True, False] if torch.cuda.is_available() else [False]  # whether to use a CUDA device

    N_REPETITIONS = 3  # Number of times to repeat the experiment (to get better time estimates)


.. GENERATED FROM PYTHON SOURCE LINES 232-234

The following path needs to be changed to your local folder containing the
TUH Abnormal corpus:

.. GENERATED FROM PYTHON SOURCE LINES 234-237

.. code-block:: default

    TUH_PATH = ('/storage/store/data/tuh_eeg/www.isip.piconepress.com/projects/'
                'tuh_eeg/downloads/tuh_eeg_abnormal/v2.0.0/edf/')


.. GENERATED FROM PYTHON SOURCE LINES 238-240

We can finally cycle through all the different combinations of the parameters
we set above to evaluate their execution time:

.. GENERATED FROM PYTHON SOURCE LINES 240-294

.. code-block:: default


    all_results = list()
    for (i, preload, n_subjects, win_len_s, n_epochs, batch_size, model_kind,
            num_workers, pin_memory, cuda) in product(
                range(N_REPETITIONS), PRELOAD, N_SUBJECTS, WINDOW_LEN_S, N_EPOCHS,
                BATCH_SIZE, MODEL, NUM_WORKERS, PIN_MEMORY, CUDA):

        results = {
            'repetition': i,
            'preload': preload,
            'n_subjects': n_subjects,
            'win_len_s': win_len_s,
            'n_epochs': n_epochs,
            'batch_size': batch_size,
            'model_kind': model_kind,
            'num_workers': num_workers,
            'pin_memory': pin_memory,
            'cuda': cuda
        }
        print(f'\nRepetition {i + 1}/{N_REPETITIONS}:\n{results}')

        # Load the dataset
        data_loading_start = time.time()
        dataset = load_example_data(preload, win_len_s, n_subjects=n_subjects)
        data_loading_end = time.time()

        # Create the data loader
        training_setup_start = time.time()
        dataloader = DataLoader(
            dataset, batch_size=batch_size, shuffle=False, pin_memory=pin_memory,
            num_workers=num_workers, worker_init_fn=None)

        # Instantiate model and optimizer
        n_channels = len(dataset.datasets[0].windows.ch_names)
        n_times = len(dataset.datasets[0].windows.times)
        n_classes = 2
        model, loss, optimizer = create_example_model(
            n_channels, n_classes, n_times, kind=model_kind, cuda=cuda)
        training_setup_end = time.time()

        # Start training loop
        model_training_start = time.time()
        trained_model = run_training(
            model, dataloader, loss, optimizer, n_epochs=n_epochs, cuda=cuda)
        model_training_end = time.time()

        del dataset, model, loss, optimizer, trained_model

        # Record timing results
        results['data_preparation'] = data_loading_end - data_loading_start
        results['training_setup'] = training_setup_end - training_setup_start
        results['model_training'] = model_training_end - model_training_start
        all_results.append(results)


.. GENERATED FROM PYTHON SOURCE LINES 295-297

The results are formatted into a pandas DataFrame and saved locally as a CSV
file.

.. GENERATED FROM PYTHON SOURCE LINES 297-303

.. code-block:: default


    results_df = pd.DataFrame(all_results)
    fname = 'lazy_vs_eager_loading_results.csv'
    results_df.to_csv(fname)
    print(f'Results saved under {fname}.')


.. GENERATED FROM PYTHON SOURCE LINES 304-305

We can finally summarize this information into the following plot:

.. GENERATED FROM PYTHON SOURCE LINES 305-310

.. code-block:: default


    sns.catplot(
        data=results_df, row='cuda', x='model_kind', y='model_training',
        hue='num_workers', col='preload', kind='strip')


.. GENERATED FROM PYTHON SOURCE LINES 311-314

.. warning::
  The results of this comparison will change depending on the hyperparameters
  that were set above, and on the actual hardware that is being used.

.. GENERATED FROM PYTHON SOURCE LINES 316-320

Generally speaking, we expect lazy loading to be slower than eager loading
during model training, but to potentially be pretty competitive if multiple
workers were enabled (i.e.., `num_workers > 0`). Training on a CUDA device
should also yield substantial speedups.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.000 seconds)

**Estimated memory usage:**  0 MB


.. _sphx_glr_download_auto_examples_benchmark_lazy_eager_loading.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: benchmark_lazy_eager_loading.py <benchmark_lazy_eager_loading.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: benchmark_lazy_eager_loading.ipynb <benchmark_lazy_eager_loading.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_