EEG Based Multimodal Emotion Recognition

Posted Apr 13, 2025

By Wei Xiong

4 min read

Paper Link

I. 🌟 Core Focus and Contributions

Topic: A systematic review of EEG-based Multimodal Emotion Recognition (EMER)
Focus: Centers on EEG as the primary modality, combined with additional physiological or behavioral signals
Perspective: Machine Learning-focused, with emphasis on how ML methods tackle multimodal challenges
Gap Addressed: Most prior reviews treat all modalities equally or broadly. This review uniquely positions EEG as central and deeply explores ML techniques for EMER
Structure: Organized around three key ML challenges:
1. Multimodal Feature Representation Learning
2. Multimodal Physiological Signal Fusion
3. Incomplete Multimodal Learning

II. 🔍 Introduction & Motivation

Importance: Emotion is fundamental for human experience and crucial for AI in health, education, driving, etc.
Why physiological signals?: Less susceptible to voluntary manipulation compared to speech/facial expressions
Why EEG?
- Directly reflects central nervous system (CNS) activity
- High temporal resolution
- Portable, cost-effective, and non-invasive
EMER Hypothesis: EEG + other modalities → improved accuracy and robustness over unimodal EEG
Fig. 1: Compares unimodal EEG (EER) pipeline with multimodal (EMER) workflow

III. 📡 Overview of Multimodal Signals

A. EEG Signals

Nature: Encodes cognitive and emotional processes
Properties: High temporal resolution, unique frequency bands (δ, θ, α, β, γ)
Feature Extraction:
- Handcrafted:
  - Time domain: ERPs, HOC, Hjorth, FD
  - Frequency: PSD, DE, ERD/ERS
  - Time-frequency: STFT, Wavelet, Hilbert-Huang
- Deep Features:
  - CNNs (requires converting EEG to maps)
  - GNNs (learn spatial relationships)
  - RNNs / LSTMs (model temporal patterns)
  - Attention mechanisms
  - Domain Adaptation (e.g., DANs)

B. Other Physiological & Behavioral Signals

Peripheral: ECG, BVP, EMG, EOG, GSR, BP, SKT
Behavioral: Facial expressions, eye movements
Feature Extraction: Handcrafted stats, PSD, DE, and deep learning features
Facial: LBP, HOG, AUs
Eye: Pupil diameter, fixations
Table III: Summarizes modalities and features

IV. 🧪 Datasets & Experimental Design

A. Emotion Models

Discrete: Ekman’s 6, Plutchik’s 8
Dimensional: Valence-Arousal (VA), Valence-Arousal-Dominance (VAD)

B. Public Datasets (See Table IV)

Dataset	Modalities	Labels	Stimuli
DEAP	EEG + video + peripheral	VAD, liking, familiarity	Music videos
DREAMER	EEG + ECG	VAD	Movie clips
SEED / SEED-IV	EEG + Eye	Discrete (Happy, Sad…)	Movie clips
MAHNOB-HCI	EEG + multimodal	VAD	Videos
ASCERTAIN	EEG + GSR + facial	VAD + personality	Movie clips
AMIGOS	EEG + video + GSR	VAD + 7 emotions	Short/long videos

C. Experimental Strategies

Subject-dependent (within-person)
Subject-independent (cross-person)

D. Evaluation Metrics (Table V)

Performance: ACC, STD, Precision, Recall, F1, p-value, ROC, AUC
Efficiency: Inference time, model size, energy consumption

V. 🧠 Representation Learning (Challenge 1)

A. Joint Representation (Fig 4a)

Concept: Map all modalities into a shared space
Methods:
- RBMs: Learn joint distributions
- BDAEs / AEs: Compress multimodal inputs into latent representations
- DBNs: Hierarchical learning, possibly modality-specific then merged

B. Coordinated Representation (Fig 4b)

Concept: Keep modality-specific spaces, enforce cross-modal constraints
Methods:
- Similarity constraints: Maximize similarity (e.g., cosine) between modal outputs
- Structured space/correlation:
  - CCA: Linear projection maximizing inter-modality correlation
  - DCCA: Deep variant capturing nonlinear relationships
  - Discriminative CCA: Adds class information to improve discriminative power

C. Interpretability Discussion

Limited but growing
Use t-SNE and topographic maps to visualize learned features
Suggested tools: LIME, CAVs, Attention

VI. 🔗 Multimodal Fusion (Challenge 2)

A. Feature-Level Fusion (Fig 5a)

Before classification
Methods:
- Simple concatenation
- Weighted selection (e.g., ReliefF)
- Learned fusion: attention, MMK, MFSAE, HDC-MER

B. Decision-Level Fusion (Fig 5b)

After classification per modality
Methods:
- Rule-based (max, sum, voting)
- Weighted averaging (fixed or learned)
- Advanced: Bayesian, stacking

C. Hybrid Fusion (Fig 5c)

Combines feature-level and decision-level stages
Flexible but more complex

VII. ⚠️ Incomplete Multimodal Learning (Challenge 3)

A. Missing Data (Fig 6a)

Noise, motion artifacts
Solutions:
- Discard
- Impute (interpolation, RBMs, SiMVAE)

B. Missing Modality (Fig 6b)

Some sensors unavailable at test time
Solutions:
- Use only available modalities (PoE)
- Generate missing modality using GANs
- CCA/DCCA projection from available to shared space
- Find nearest complete sample using distance metric (e.g., Minkowski)

VIII. 🎯 Applications, Challenges, and Future Directions

A. Applications

Healthcare: Consciousness disorders, therapy monitoring
Driving: Driver emotion/stress detection
Education: Engagement monitoring
Gaming: Emotion-aware BCI control

B. Challenges

Signal inconsistency
Deep model interpretability (“black box”)
Missing data/modality
Lack of standardized benchmarks

C. Future Directions

More robust fusion/representation techniques
Domain-specific interpretability tools
Stronger missing data handling
Open datasets, reproducible code, benchmarking protocols

IX. ✅ Conclusion

Provides a systematic, ML-focused review of EEG-based EMER
Details representation learning, fusion, and handling missing data
Covers datasets, methods, metrics, and future directions
Serves as an in-depth guide for researchers entering EMER

X. 📊 Key Figures and Tables

#	Content
Fig 1	EER vs EMER workflows
Fig 2	Sensor placement
Fig 3	Emotion models (VA/VAD)
Fig 4	Representation learning (Joint vs Coordinated)
Fig 5	Fusion strategies
Fig 6	Missing data vs missing modality
Table I	Summary of existing EMER studies
Table II	Signal abbreviations
Table III	Features of each modality
Table IV	Public EMER datasets
Table V	Evaluation metrics
Table VI	Representation learning studies
Table VII	Fusion strategy studies

LLM, Bio-Engineering

This post is licensed under CC BY 4.0 by the author.

Paper Link

I. 🌟 Core Focus and Contributions

II. 🔍 Introduction & Motivation

III. 📡 Overview of Multimodal Signals

A. EEG Signals

B. Other Physiological & Behavioral Signals

IV. 🧪 Datasets & Experimental Design

A. Emotion Models

B. Public Datasets (See Table IV)

C. Experimental Strategies

D. Evaluation Metrics (Table V)

V. 🧠 Representation Learning (Challenge 1)

A. Joint Representation (Fig 4a)

B. Coordinated Representation (Fig 4b)

C. Interpretability Discussion

VI. 🔗 Multimodal Fusion (Challenge 2)

A. Feature-Level Fusion (Fig 5a)

B. Decision-Level Fusion (Fig 5b)

C. Hybrid Fusion (Fig 5c)

VII. ⚠️ Incomplete Multimodal Learning (Challenge 3)

A. Missing Data (Fig 6a)

B. Missing Modality (Fig 6b)

VIII. 🎯 Applications, Challenges, and Future Directions

A. Applications

B. Challenges

C. Future Directions

IX. ✅ Conclusion

X. 📊 Key Figures and Tables

Trending Tags