Post

EEG Based Multimodal Emotion Recognition

EEG Based Multimodal Emotion Recognition

Paper Link

I. 🌟 Core Focus and Contributions

  • Topic: A systematic review of EEG-based Multimodal Emotion Recognition (EMER)
  • Focus: Centers on EEG as the primary modality, combined with additional physiological or behavioral signals
  • Perspective: Machine Learning-focused, with emphasis on how ML methods tackle multimodal challenges
  • Gap Addressed: Most prior reviews treat all modalities equally or broadly. This review uniquely positions EEG as central and deeply explores ML techniques for EMER
  • Structure: Organized around three key ML challenges:
    1. Multimodal Feature Representation Learning
    2. Multimodal Physiological Signal Fusion
    3. Incomplete Multimodal Learning

II. 🔍 Introduction & Motivation

  • Importance: Emotion is fundamental for human experience and crucial for AI in health, education, driving, etc.
  • Why physiological signals?: Less susceptible to voluntary manipulation compared to speech/facial expressions
  • Why EEG?
    • Directly reflects central nervous system (CNS) activity
    • High temporal resolution
    • Portable, cost-effective, and non-invasive
  • EMER Hypothesis: EEG + other modalities → improved accuracy and robustness over unimodal EEG
  • Fig. 1: Compares unimodal EEG (EER) pipeline with multimodal (EMER) workflow

III. 📡 Overview of Multimodal Signals

A. EEG Signals

  • Nature: Encodes cognitive and emotional processes
  • Properties: High temporal resolution, unique frequency bands (δ, θ, α, β, γ)
  • Feature Extraction:
    • Handcrafted:
      • Time domain: ERPs, HOC, Hjorth, FD
      • Frequency: PSD, DE, ERD/ERS
      • Time-frequency: STFT, Wavelet, Hilbert-Huang
    • Deep Features:
      • CNNs (requires converting EEG to maps)
      • GNNs (learn spatial relationships)
      • RNNs / LSTMs (model temporal patterns)
      • Attention mechanisms
      • Domain Adaptation (e.g., DANs)

B. Other Physiological & Behavioral Signals

  • Peripheral: ECG, BVP, EMG, EOG, GSR, BP, SKT
  • Behavioral: Facial expressions, eye movements
  • Feature Extraction: Handcrafted stats, PSD, DE, and deep learning features
  • Facial: LBP, HOG, AUs
  • Eye: Pupil diameter, fixations
  • Table III: Summarizes modalities and features

IV. 🧪 Datasets & Experimental Design

A. Emotion Models

  • Discrete: Ekman’s 6, Plutchik’s 8
  • Dimensional: Valence-Arousal (VA), Valence-Arousal-Dominance (VAD)

B. Public Datasets (See Table IV)

DatasetModalitiesLabelsStimuli
DEAPEEG + video + peripheralVAD, liking, familiarityMusic videos
DREAMEREEG + ECGVADMovie clips
SEED / SEED-IVEEG + EyeDiscrete (Happy, Sad…)Movie clips
MAHNOB-HCIEEG + multimodalVADVideos
ASCERTAINEEG + GSR + facialVAD + personalityMovie clips
AMIGOSEEG + video + GSRVAD + 7 emotionsShort/long videos

C. Experimental Strategies

  • Subject-dependent (within-person)
  • Subject-independent (cross-person)

D. Evaluation Metrics (Table V)

  • Performance: ACC, STD, Precision, Recall, F1, p-value, ROC, AUC
  • Efficiency: Inference time, model size, energy consumption

V. 🧠 Representation Learning (Challenge 1)

A. Joint Representation (Fig 4a)

  • Concept: Map all modalities into a shared space
  • Methods:
    • RBMs: Learn joint distributions
    • BDAEs / AEs: Compress multimodal inputs into latent representations
    • DBNs: Hierarchical learning, possibly modality-specific then merged

B. Coordinated Representation (Fig 4b)

  • Concept: Keep modality-specific spaces, enforce cross-modal constraints
  • Methods:
    • Similarity constraints: Maximize similarity (e.g., cosine) between modal outputs
    • Structured space/correlation:
      • CCA: Linear projection maximizing inter-modality correlation
      • DCCA: Deep variant capturing nonlinear relationships
      • Discriminative CCA: Adds class information to improve discriminative power

C. Interpretability Discussion

  • Limited but growing
  • Use t-SNE and topographic maps to visualize learned features
  • Suggested tools: LIME, CAVs, Attention

VI. 🔗 Multimodal Fusion (Challenge 2)

A. Feature-Level Fusion (Fig 5a)

  • Before classification
  • Methods:
    • Simple concatenation
    • Weighted selection (e.g., ReliefF)
    • Learned fusion: attention, MMK, MFSAE, HDC-MER

B. Decision-Level Fusion (Fig 5b)

  • After classification per modality
  • Methods:
    • Rule-based (max, sum, voting)
    • Weighted averaging (fixed or learned)
    • Advanced: Bayesian, stacking

C. Hybrid Fusion (Fig 5c)

  • Combines feature-level and decision-level stages
  • Flexible but more complex

VII. ⚠️ Incomplete Multimodal Learning (Challenge 3)

A. Missing Data (Fig 6a)

  • Noise, motion artifacts
  • Solutions:
    • Discard
    • Impute (interpolation, RBMs, SiMVAE)

B. Missing Modality (Fig 6b)

  • Some sensors unavailable at test time
  • Solutions:
    • Use only available modalities (PoE)
    • Generate missing modality using GANs
    • CCA/DCCA projection from available to shared space
    • Find nearest complete sample using distance metric (e.g., Minkowski)

VIII. 🎯 Applications, Challenges, and Future Directions

A. Applications

  • Healthcare: Consciousness disorders, therapy monitoring
  • Driving: Driver emotion/stress detection
  • Education: Engagement monitoring
  • Gaming: Emotion-aware BCI control

B. Challenges

  • Signal inconsistency
  • Deep model interpretability (“black box”)
  • Missing data/modality
  • Lack of standardized benchmarks

C. Future Directions

  • More robust fusion/representation techniques
  • Domain-specific interpretability tools
  • Stronger missing data handling
  • Open datasets, reproducible code, benchmarking protocols

IX. ✅ Conclusion

  • Provides a systematic, ML-focused review of EEG-based EMER
  • Details representation learning, fusion, and handling missing data
  • Covers datasets, methods, metrics, and future directions
  • Serves as an in-depth guide for researchers entering EMER

X. 📊 Key Figures and Tables

#Content
Fig 1EER vs EMER workflows
Fig 2Sensor placement
Fig 3Emotion models (VA/VAD)
Fig 4Representation learning (Joint vs Coordinated)
Fig 5Fusion strategies
Fig 6Missing data vs missing modality
Table ISummary of existing EMER studies
Table IISignal abbreviations
Table IIIFeatures of each modality
Table IVPublic EMER datasets
Table VEvaluation metrics
Table VIRepresentation learning studies
Table VIIFusion strategy studies
This post is licensed under CC BY 4.0 by the author.