EEG Based Multimodal Emotion Recognition
EEG Based Multimodal Emotion Recognition
Paper Link
I. 🌟 Core Focus and Contributions
- Topic: A systematic review of EEG-based Multimodal Emotion Recognition (EMER)
- Focus: Centers on EEG as the primary modality, combined with additional physiological or behavioral signals
- Perspective: Machine Learning-focused, with emphasis on how ML methods tackle multimodal challenges
- Gap Addressed: Most prior reviews treat all modalities equally or broadly. This review uniquely positions EEG as central and deeply explores ML techniques for EMER
- Structure: Organized around three key ML challenges:
- Multimodal Feature Representation Learning
- Multimodal Physiological Signal Fusion
- Incomplete Multimodal Learning
II. 🔍 Introduction & Motivation
- Importance: Emotion is fundamental for human experience and crucial for AI in health, education, driving, etc.
- Why physiological signals?: Less susceptible to voluntary manipulation compared to speech/facial expressions
- Why EEG?
- Directly reflects central nervous system (CNS) activity
- High temporal resolution
- Portable, cost-effective, and non-invasive
- EMER Hypothesis: EEG + other modalities → improved accuracy and robustness over unimodal EEG
- Fig. 1: Compares unimodal EEG (EER) pipeline with multimodal (EMER) workflow
III. 📡 Overview of Multimodal Signals
A. EEG Signals
- Nature: Encodes cognitive and emotional processes
- Properties: High temporal resolution, unique frequency bands (δ, θ, α, β, γ)
- Feature Extraction:
- Handcrafted:
- Time domain: ERPs, HOC, Hjorth, FD
- Frequency: PSD, DE, ERD/ERS
- Time-frequency: STFT, Wavelet, Hilbert-Huang
- Deep Features:
- CNNs (requires converting EEG to maps)
- GNNs (learn spatial relationships)
- RNNs / LSTMs (model temporal patterns)
- Attention mechanisms
- Domain Adaptation (e.g., DANs)
- Handcrafted:
B. Other Physiological & Behavioral Signals
- Peripheral: ECG, BVP, EMG, EOG, GSR, BP, SKT
- Behavioral: Facial expressions, eye movements
- Feature Extraction: Handcrafted stats, PSD, DE, and deep learning features
- Facial: LBP, HOG, AUs
- Eye: Pupil diameter, fixations
- Table III: Summarizes modalities and features
IV. 🧪 Datasets & Experimental Design
A. Emotion Models
- Discrete: Ekman’s 6, Plutchik’s 8
- Dimensional: Valence-Arousal (VA), Valence-Arousal-Dominance (VAD)
B. Public Datasets (See Table IV)
Dataset | Modalities | Labels | Stimuli |
---|---|---|---|
DEAP | EEG + video + peripheral | VAD, liking, familiarity | Music videos |
DREAMER | EEG + ECG | VAD | Movie clips |
SEED / SEED-IV | EEG + Eye | Discrete (Happy, Sad…) | Movie clips |
MAHNOB-HCI | EEG + multimodal | VAD | Videos |
ASCERTAIN | EEG + GSR + facial | VAD + personality | Movie clips |
AMIGOS | EEG + video + GSR | VAD + 7 emotions | Short/long videos |
C. Experimental Strategies
- Subject-dependent (within-person)
- Subject-independent (cross-person)
D. Evaluation Metrics (Table V)
- Performance: ACC, STD, Precision, Recall, F1, p-value, ROC, AUC
- Efficiency: Inference time, model size, energy consumption
V. 🧠 Representation Learning (Challenge 1)
A. Joint Representation (Fig 4a)
- Concept: Map all modalities into a shared space
- Methods:
- RBMs: Learn joint distributions
- BDAEs / AEs: Compress multimodal inputs into latent representations
- DBNs: Hierarchical learning, possibly modality-specific then merged
B. Coordinated Representation (Fig 4b)
- Concept: Keep modality-specific spaces, enforce cross-modal constraints
- Methods:
- Similarity constraints: Maximize similarity (e.g., cosine) between modal outputs
- Structured space/correlation:
- CCA: Linear projection maximizing inter-modality correlation
- DCCA: Deep variant capturing nonlinear relationships
- Discriminative CCA: Adds class information to improve discriminative power
C. Interpretability Discussion
- Limited but growing
- Use t-SNE and topographic maps to visualize learned features
- Suggested tools: LIME, CAVs, Attention
VI. 🔗 Multimodal Fusion (Challenge 2)
A. Feature-Level Fusion (Fig 5a)
- Before classification
- Methods:
- Simple concatenation
- Weighted selection (e.g., ReliefF)
- Learned fusion: attention, MMK, MFSAE, HDC-MER
B. Decision-Level Fusion (Fig 5b)
- After classification per modality
- Methods:
- Rule-based (max, sum, voting)
- Weighted averaging (fixed or learned)
- Advanced: Bayesian, stacking
C. Hybrid Fusion (Fig 5c)
- Combines feature-level and decision-level stages
- Flexible but more complex
VII. ⚠️ Incomplete Multimodal Learning (Challenge 3)
A. Missing Data (Fig 6a)
- Noise, motion artifacts
- Solutions:
- Discard
- Impute (interpolation, RBMs, SiMVAE)
B. Missing Modality (Fig 6b)
- Some sensors unavailable at test time
- Solutions:
- Use only available modalities (PoE)
- Generate missing modality using GANs
- CCA/DCCA projection from available to shared space
- Find nearest complete sample using distance metric (e.g., Minkowski)
VIII. 🎯 Applications, Challenges, and Future Directions
A. Applications
- Healthcare: Consciousness disorders, therapy monitoring
- Driving: Driver emotion/stress detection
- Education: Engagement monitoring
- Gaming: Emotion-aware BCI control
B. Challenges
- Signal inconsistency
- Deep model interpretability (“black box”)
- Missing data/modality
- Lack of standardized benchmarks
C. Future Directions
- More robust fusion/representation techniques
- Domain-specific interpretability tools
- Stronger missing data handling
- Open datasets, reproducible code, benchmarking protocols
IX. ✅ Conclusion
- Provides a systematic, ML-focused review of EEG-based EMER
- Details representation learning, fusion, and handling missing data
- Covers datasets, methods, metrics, and future directions
- Serves as an in-depth guide for researchers entering EMER
X. 📊 Key Figures and Tables
# | Content |
---|---|
Fig 1 | EER vs EMER workflows |
Fig 2 | Sensor placement |
Fig 3 | Emotion models (VA/VAD) |
Fig 4 | Representation learning (Joint vs Coordinated) |
Fig 5 | Fusion strategies |
Fig 6 | Missing data vs missing modality |
Table I | Summary of existing EMER studies |
Table II | Signal abbreviations |
Table III | Features of each modality |
Table IV | Public EMER datasets |
Table V | Evaluation metrics |
Table VI | Representation learning studies |
Table VII | Fusion strategy studies |
This post is licensed under CC BY 4.0 by the author.