WifeBench
Meta logo
Muse Spark
47
Google logo
Gemini 3.5 Flash
44
DeepSeek logo
DeepSeek V4 Flash
41
Meta logo
Llama 4 Scout
40
Mistral logo
Mistral Small 4
38
xAI logo
Grok Build 0.1
35
Google logo
Gemma 4 26B-A4B
33
MiniMax logo
MiniMax M3
31
OpenAI logo
GPT-5.6 Luna
29
Mistral logo
Mistral Medium 3.5
27

Methodology

How does work?

Whenever a new model drops, my wife asks it 10 questions (only she knows answers to) and scores how close the model's answers are to hers on a scale of 1–100.
No rubric. No committee. No peer review.
Just one honest verdict from the person whose opinion actually matters.

10 questions1 honest score0 MMLU

Benchmarks you can trust. My wife said so. 💍