WifeBench — LLM models, rated by my wife

Current Top 10

Muse Spark

Gemini 3.5 Flash

DeepSeek V4 Flash

Llama 4 Scout

Mistral Small 4

Grok Build 0.1

Gemma 4 26B-A4B

MiniMax M3

GPT-5.6 Luna

Mistral Medium 3.5

Methodology

How does work?

Whenever a new model drops, my wife asks it 10 questions (only she knows answers to) and scores how close the model's answers are to hers on a scale of 1–100.
No rubric. No committee. No peer review.
Just one honest verdict from the person whose opinion actually matters.

10 questions1 honest score0 MMLU