All experiments
CompletedClassification2026-04-01

Metaphor vs. Literal Classification

Can predicted brain responses distinguish metaphorical from literal language? Figurative language engages additional right-hemisphere and temporal regions — this experiment tests whether TRIBE v2 encodes that distinction.

languagemetaphorfigurativeclassificationlogistic regression

Summary

TRIBE v2 predicted brain responses distinguish metaphorical from literal language with 95% accuracy on a held-out test set (20 unseen stimuli, AUC 1.000). The classifier was trained on 80 stimuli using a Pipeline with StratifiedKFold cross-validation (78.8% CV accuracy, 0.901 AUC), then evaluated on 20 completely held-out stimuli — 10 metaphors and 10 literal statements it never trained on. Only 1 out of 20 holdout stimuli was misclassified — a literal sentence predicted as metaphor. This suggests TRIBE v2 encodes a distinction between figurative and literal language processing in its predicted brain patterns, and that this signal generalizes to unseen text.

Key metrics

Holdout Acc

95.0%

Accuracy on 20 held-out stimuli (10+10)

Holdout AUC

1.0

ROC AUC on holdout set — perfect separation

CV Accuracy

78.8%

5-fold StratifiedKFold on 80 training stimuli

CV AUC

0.901

ROC AUC on Pipeline CV

Train

80.0

40 metaphor + 40 literal for training

Holdout

20.0

10 metaphor + 10 literal held out

Features

20,484

Cortical vertices per sample

Regularization

L1 (C=1.0)

SAGA solver, lasso penalty

Confusion matrix

Actual

34
6
11
29
Literal
Metaphor

Predicted

Row 1 = Literal · Row 2 = Metaphor

Classification report

ClassPrecisionRecallF1Support
Literal0.901.000.9510
Metaphor1.000.900.9510
Macro avg0.950.950.9520

PCA projection

-12.1-0.111.9
8.6-8.6

PC1 (28.1% var)

metaphor
literal

PC2 (21.0% var)

Discriminative brain regions

Sparsity

98.3%

Active vertices

353 / 20,484

Metaphor-predictive

RH Orbital Gyrus8.8%
LH Inferior Temporal Gyrus8.3%
RH Central Sulcus8.0%
LH Orbital Gyrus7.5%
LH Medial Orbital-Olfactory Sulcus7.1%
RH Gyrus Rectus6.3%
LH Superior Frontal Gyrus6.0%
LH Temporal Pole5.0%

Literal-predictive

LH Occipital Pole26.2%
RH Occipital Pole16.9%
LH Postcentral Gyrus9.3%
RH Superior Parietal Gyrus8.4%
RH Orbital Gyrus5.8%
LH Gyrus Rectus5.1%
LH Parahippocampal Gyrus5.0%
LH Transverse Frontopolar Gyrus4.6%

Regions identified via L1-regularized logistic regression weights mapped onto the Destrieux cortical atlas (fsaverage5). Percentages reflect each region's share of total cortical weight for its category (excluding medial wall vertices).

Interested in what these regions mean? Read the full discussion — a region-by-region analysis comparing these findings to published neuroscience literature.

Figures

Holdout confusion matrix — Unseen stimuli

Holdout confusion matrix — Unseen stimuli

Gold-standard test on 20 held-out stimuli (10 metaphor + 10 literal). 9/10 literal correct, 10/10 metaphor correct. Only 1 literal sentence was mistaken as metaphor.

CV confusion matrix — Training stimuli (StratifiedKFold)

CV confusion matrix — Training stimuli (StratifiedKFold)

5-fold StratifiedKFold cross-validation on 80 training stimuli with Pipeline (scaler inside CV). 78.8% accuracy — 17 misclassifications out of 80.

PCA scatter — Brain response patterns

PCA scatter — Brain response patterns

First two principal components of brain activation vectors. Metaphor and literal clusters show separation consistent with the classification performance.

Classifier weights — Left lateral

Classifier weights — Left lateral

Logistic regression weights projected onto the left lateral brain surface. Warm regions are metaphor-predictive.

Classifier weights — Left medial

Classifier weights — Left medial

Logistic regression weights on the left medial surface.

Classifier weights — Right lateral

Classifier weights — Right lateral

Logistic regression weights on the right lateral surface.

Classifier weights — Right medial

Classifier weights — Right medial

Logistic regression weights on the right medial surface.

Experiment data excludes raw stimuli and large prediction arrays.

Back to experiments