Retrieval Evaluation

How well the gnome is finding what you actually meant.

Hit@18 runs
60.0%
Hit@58 runs
85.0%
Hit@108 runs
85.0%
MRR8 runs
0.717
Mode hybrid20 eval casesLast run 2026-06-10 04:42:06Run #7

Run Eval

Lexical is the safe smoke check; hybrid runs the full retrieval path.

History

RunModeCasesHit@1Hit@5Hit@10MRR
2026-06-10 04:42:06hybrid2060.0%85.0%85.0%0.717
2026-06-10 02:28:08lexical540.0%40.0%40.0%0.400
2026-06-10 02:18:43semantic520.0%20.0%20.0%0.200
2026-06-10 02:13:08hybrid540.0%60.0%60.0%0.500
2026-06-10 02:12:59lexical540.0%40.0%40.0%0.400
2026-06-02 01:46:09lexical2075.0%75.0%80.0%0.756
2026-06-02 01:43:25lexical10.0%0.0%0.0%0.000
2026-05-29 20:51:19lexical15052.7%54.0%54.7%0.534

About these metrics

  • Hit@1 / Hit@5 / Hit@10 — fraction of queries where the labeled-correct chunk appears in the top N results.
  • MRR (Mean Reciprocal Rank) — average of 1/rank across queries. 1.0 means the right chunk is always first.
  • Regression alert — triggered when MRR drops by more than 0.05 versus the previous run.