2. Preprocessing#

Preprocessing is the first step in EEG data analysis. It usually involves a series of steps aimed at removing non-brain-related noise and artifacts from the data. Unlike the following steps (e.g., epoching and averaging), it leaves the data in a continuous format (EEG channels × timepoints).

Goals

  • Loading raw EEG data

  • Plotting the raw data

  • Filtering the data to remove low and high frequency noise

  • Correcting eye artifacts using independent component analysis (ICA)

  • Re-referencing the data to an average reference

2.1. Load Python modules#

We will use the following Python modules:

Note that on Google Colab, you will need to install these modules first. You can uncomment and run the following cell to do so.

# %pip install mne hu-neuro-pipeline
from mne import set_bipolar_reference
from mne.io import read_raw
from mne.preprocessing import ICA
from pipeline.datasets import get_erpcore

2.2. Download example data#

We’ll use data from the ERP CORE dataset (Kappenman et al, 2021). This dataset contains EEG data from 40 participants and 6 different experiments. Each experiment was designed to elicit one or two commonly studied ERP components.

https://ars.els-cdn.com/content/image/1-s2.0-S1053811920309502-gr1.jpg

Fig. 2.1 The six different ERP CORE experiments. Source: Kappenman et al (2021)#

In this example, we’ll use the data from the fourth participant in the face perception (N170) experiment.

files_dict = get_erpcore('N170', participants='sub-004', path='data')
files_dict
Downloading file 'erpcore/N170/README.txt' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//612e8fb3af610c00b9dff048' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/task-N170_events.json' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60077b01ba010908a78927da' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/participants.tsv' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60061136df7ff00036478ff8' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/participants.json' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60061134df7ff00036478fed' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/dataset_description.json' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60061132df7ff000314746ab' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/LICENSE' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//6006112bba010907d18928e0' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/CHANGES' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60061128d77bf1000c677b6f' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_events.tsv' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60075c87ba01090877890ad5' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_coordsystem.json' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//61021546c7a976029b9e5be8' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_electrodes.tsv' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//61021543c7a976029a9e200c' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_eeg.set' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60075c8586541a090014bc12' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_channels.tsv' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60075c77ba01090877890ac5' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_eeg.fdt' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60075c7f86541a08f914ab5d' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
Downloading file 'erpcore/N170/sub-004/eeg/sub-004_task-N170_eeg.json' from 'https://files.de-1.osf.io/v1/resources/pfde9/providers/osfstorage//60075c81e80d3708caa5832c' to '/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data'.
{'log_files': ['/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data/erpcore/N170/sub-004/eeg/sub-004_task-N170_events.tsv'],
 'raw_files': ['/home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data/erpcore/N170/sub-004/eeg/sub-004_task-N170_eeg.set']}

2.3. Load raw data#

We read the actual EEG data files (eeg.set/eeg.fdt) into MNE-Python. The result is a Raw object, which contains the continuous EEG data and some metadata.

raw_file = files_dict['raw_files'][0]
raw = read_raw(raw_file, preload=True)
raw
Reading /home/runner/work/intro-to-eeg/intro-to-eeg/ipynb/data/erpcore/N170/sub-004/eeg/sub-004_task-N170_eeg.fdt
Reading 0 ... 649215  =      0.000 ...   633.999 secs...
General
Measurement date Unknown
Experimenter Unknown
Participant Unknown
Channels
Digitized points 33 points
Good channels 33 EEG
Bad channels None
EOG channels Not available
ECG channels Not available
Data
Sampling frequency 1024.00 Hz
Highpass 0.00 Hz
Lowpass 512.00 Hz
Filenames sub-004_task-N170_eeg.fdt
Duration 00:10:34 (HH:MM:SS)

We can access the actual data array (a Numpy array) using the get_data() method.

raw.get_data()
array([[-0.01593189, -0.0159683 , -0.01600371, ..., -0.01595349,
        -0.01595786, -0.01596058],
       [-0.03090565, -0.03089662, -0.03088733, ..., -0.03030712,
        -0.03030746, -0.03030712],
       [ 0.00585654,  0.00586507,  0.00585632, ...,  0.0053501 ,
         0.00534794,  0.00534897],
       ...,
       [ 0.01395562,  0.01396296,  0.01395765, ...,  0.0130488 ,
         0.01304721,  0.0130503 ],
       [-0.00940747, -0.00941859, -0.00942844, ..., -0.00855003,
        -0.00854672, -0.00854356],
       [-0.00327154, -0.00327973, -0.00327407, ..., -0.0036367 ,
        -0.00363201, -0.00362738]])

Let’s check the size (number of dimensions and their length) of this array:

raw.get_data().shape
(33, 649216)

We see that it has two dimensions (EEG channels × timepoints).

2.4. Plot raw data#

We can plot the raw data using the plot() method. We specify which time segment of the data to plot using the start and duration arguments. Here we plot 5 seconds of data, starting at 60 seconds.

_ = raw.plot(start=60.0, duration=5.0)
Using matplotlib as 2D backend.
../_images/71db13a121359c8b6513c00d0fbd0d964de52aa1c54bac8de2f1858dafc3ac33.png

2.5. Add channel information#

Right now, MNE thinks that all channels are EEG channels. However, we know that some of them are actually EOG channels that record eye movements and blinks. We’ll use these to create new “virtual” EOG channels that pick up strong eye signals (vertical EOG [VEOG] = difference between above and below the eyes; horizontal EOG [HEOG] = difference between left and right side of the eyes). We explicitly set their channel type to 'eog' and drop the original channels, so that we are left with 30 EEG channels and 2 EOG channels.

raw = set_bipolar_reference(raw, anode='FP1', cathode='VEOG_lower',
                            ch_name='VEOG', drop_refs=False)
raw = set_bipolar_reference(raw, anode='HEOG_right', cathode='HEOG_left',
                            ch_name='HEOG', drop_refs=False)
raw = raw.set_channel_types({'VEOG': 'eog', 'HEOG': 'eog'})
raw = raw.drop_channels(['VEOG_lower', 'HEOG_right', 'HEOG_left'])
EEG channel type selected for re-referencing
Creating RawArray with float64 data, n_channels=1, n_times=649216
    Range : 0 ... 649215 =      0.000 ...   633.999 secs
Ready.
Added the following bipolar channels:
VEOG
EEG channel type selected for re-referencing
Creating RawArray with float64 data, n_channels=1, n_times=649216
    Range : 0 ... 649215 =      0.000 ...   633.999 secs
Ready.
Added the following bipolar channels:
HEOG

Then we load the locations of the EEG electrodes as provided by the manufacturer of the EEG system. Many of these standard EEG montages are shipped with MNE-Python.

raw = raw.set_montage('biosemi64', match_case=False)
_ = raw.plot_sensors(show_names=True)
../_images/c70d36c4cc8597f0aab020c81a144a209b05adc03ccf4bf553820a460b520480.png

2.6. Filter data#

Filtering is a common preprocessing step that is used to remove parts of the EEG signal that are unlikely to contain brain activity of interest. There are four different types of filters:

  • A high-pass filter removes low-frequency noise (e.g., slow drifts due to sweat or breathing)

  • A low-pass filter removes high-frequency noise (e.g., muscle activity)

  • A band-pass filter combines a high-pass and a low-pass filter

  • A band-stop filter removes a narrow band of frequencies (e.g., 50 Hz line noise)

We first apply a high-pass filter at 0.1 Hz to remove slow drifts and plot the filtered data.

raw = raw.filter(l_freq=0.1, h_freq=None)
_ = raw.plot(start=60.0, duration=5.0)
Filtering raw data in 1 contiguous segment
Setting up high-pass filter at 0.1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 0.10
- Lower transition bandwidth: 0.10 Hz (-6 dB cutoff frequency: 0.05 Hz)
- Filter length: 33793 samples (33.001 s)
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.4s
../_images/dd07ec2617514c93d64d3d73f63f6a7fad4f90ff5159ab70db021b9a521bc9d8.png

Next, we apply a low-pass filter at 30 Hz to remove high-frequency noise and plot the data again.

raw = raw.filter(l_freq=None, h_freq=30.0)
_ = raw.plot(start=60.0, duration=5.0)
Filtering raw data in 1 contiguous segment
Setting up low-pass filter at 30 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal lowpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Upper passband edge: 30.00 Hz
- Upper transition bandwidth: 7.50 Hz (-6 dB cutoff frequency: 33.75 Hz)
- Filter length: 451 samples (0.440 s)
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.2s
../_images/68101aab48384094b27c6c984766705b06ded11b3995830fb2ed9ec7bc12f90d.png

Note that we’ve performed these two filters separately for demonstration purposes, but we could have also applied a single band-pass filter.

2.7. Correct eye artifacts#

Eye blinks and eye movements are the most prominent source of artifacts in EEG data. They are approximately 10 times larger than the brain signals we are interested in and affect especially frontal electrodes.

There are multiple ways to remove eye artifacts from EEG data. The most common one is a machine learning technique called independent component analysis (ICA). ICA decomposes the EEG data into a set of independent components, each of which represents a different source of EEG activity.

Each component is characterized by a topography (i.e., a spatial pattern of activity across electrodes) and a time course (i.e., a pattern of activity over time). We can use these to identify components that we think reflect eye artifacts, and remove them from the data.

ICA is typically computed based on a high-pass filtered copy of the data (cutoff = 1 Hz). We ask the algorithm to identify 15 components and plot their scalp topographies.

raw_copy = raw.copy().filter(l_freq=1.0, h_freq=None)
ica = ICA(n_components=15)
ica = ica.fit(raw_copy)
_ = ica.plot_components()
Filtering raw data in 1 contiguous segment
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 3381 samples (3.302 s)

Fitting ICA to data using 30 channels (please be patient, this may take a while)
Selecting by number: 15 components
Fitting ICA took 6.6s.
[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.2s
../_images/511ce546db3a13916247455f26f0489b3f9551cab4a4f9aef2279d6176e4ee41.png

Then we can use a clever method that automatically identifies components that are likely to reflect eye artifacts (based on the correlation of the component’s time course with our two VEOG and HEOG channels).

eog_indices, eog_scores = ica.find_bads_eog(raw, ch_name=['HEOG', 'VEOG'],
                                            verbose=False)
ica.exclude = eog_indices
_ = ica.plot_scores(eog_scores)
/tmp/ipykernel_4300/1872092502.py:1: FutureWarning: The default for pick_channels will change from ordered=False to ordered=True in 1.5 and this will result in a change of behavior because the resulting channel order will not match. Either use a channel order that matches your instance or pass ordered=False.
  eog_indices, eog_scores = ica.find_bads_eog(raw, ch_name=['HEOG', 'VEOG'],
../_images/19a0b92d8fbbc80c38cbf84b7009137d77cbadaf97b5f1c716bf8723da94147a.png

Finally, by “applying” the ICA to the data (formally, back-projecting the non-artifact components from component space to channel space), we can remove the eye artifacts from the data.

raw = ica.apply(raw)
_ = raw.plot(start=60.0, duration=5.0)
Applying ICA to Raw instance
    Transforming to ICA space (15 components)
    Zeroing out 2 ICA components
    Projecting back using 30 PCA components
../_images/359d4363765ef28d18cb017da02116ca22c86f086dcd54db8ae5ea562e0fd2ee.png

2.8. Re-reference data#

Re-referencing is our final preprocessing step. Since the EEG signal is measured as the difference in voltage between two electrodes, the signal at any given electrode depends strongly on the “online” reference electrode (typically placed on the mastoid bone behind the ear or on the forehead).

During preprocessing (“offline”), we typically want to re-reference the data to a more neutral (and less noisy) reference, such as the average of all channels.

raw = raw.set_eeg_reference('average')
_ = raw.plot(start=60.0, duration=5.0)
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
../_images/c7f73a1faac604b71b1a015667cb920dca2e2e93e029752b5b496a0c4b5fd8b2.png

2.9. Exercises#

  1. Re-run the above analysis for a different experiment. For this, you can simply reuse the code cells above, changing only the second cell. Valid experiment names are 'N170', 'MMN', 'N2pc', 'N400', 'P3', or 'ERN'.

  2. Below, try out the effect of different filter settings such as a higher high-pass cutoff or a lower low-pass cutoff. For this, write your own code that achieves the following: (a) read the raw data from one participant, (b) apply your own custom high-pass, low-pass, or band-pass filter, (c) plot the filtered data, and (d) repeat for different filter settings.

# Your code goes here
...

2.10. Further reading#

2.11. References#

Delorme, 2023

Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. doi:10.1038/s41598-023-27528-0

Gramfort et al., 2013

Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., … Hämäläinen, M. (2013). MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7, 267. doi:10.3389/fnins.2013.00267

Kappenman et al., 2021(1,2)

Kappenman, E. S., Farrens, J. L., Zhang, W., Stewart, A. X., & Luck, S. J. (2021). ERP CORE: An open resource for human event-related potential research. NeuroImage, 225, 117465. doi:10.1016/j.neuroimage.2020.117465

Luck, 2014

Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique. 2 ed. Cambridge, Massachusetts: MIT Press.