Retrieval Evaluation

How well the gnome is finding what you actually meant.

Evaluation data is stale. Last run was 2026-06-10 04:42:06. Queue a lexical smoke eval before trusting regression status.

Hit@18 runs

60.0%

Hit@58 runs

85.0%

Hit@108 runs

85.0%

MRR8 runs

0.717

Mode hybrid20 eval casesLast run 2026-06-10 04:42:06Run #7

Run Eval

Lexical is the safe smoke check; hybrid runs the full retrieval path.

Max cases

Run	Mode	Cases	Hit@1	Hit@5	Hit@10	MRR
2026-06-10 04:42:06	hybrid	20	60.0%	85.0%	85.0%	0.717
2026-06-10 02:28:08	lexical	5	40.0%	40.0%	40.0%	0.400
2026-06-10 02:18:43	semantic	5	20.0%	20.0%	20.0%	0.200
2026-06-10 02:13:08	hybrid	5	40.0%	60.0%	60.0%	0.500
2026-06-10 02:12:59	lexical	5	40.0%	40.0%	40.0%	0.400
2026-06-02 01:46:09	lexical	20	75.0%	75.0%	80.0%	0.756
2026-06-02 01:43:25	lexical	1	0.0%	0.0%	0.0%	0.000
2026-05-29 20:51:19	lexical	150	52.7%	54.0%	54.7%	0.534

Hit@1 / Hit@5 / Hit@10 — fraction of queries where the labeled-correct chunk appears in the top N results.
MRR (Mean Reciprocal Rank) — average of 1/rank across queries. 1.0 means the right chunk is always first.
Regression alert — triggered when MRR drops by more than 0.05 versus the previous run.