Retrieval Evaluation
How well the gnome is finding what you actually meant.
Hit@18 runs
60.0%
Hit@58 runs
85.0%
Hit@108 runs
85.0%
MRR8 runs
0.717
Run Eval
Lexical is the safe smoke check; hybrid runs the full retrieval path.
History
| Run | Mode | Cases | Hit@1 | Hit@5 | Hit@10 | MRR |
|---|---|---|---|---|---|---|
| 2026-06-10 04:42:06 | hybrid | 20 | 60.0% | 85.0% | 85.0% | 0.717 |
| 2026-06-10 02:28:08 | lexical | 5 | 40.0% | 40.0% | 40.0% | 0.400 |
| 2026-06-10 02:18:43 | semantic | 5 | 20.0% | 20.0% | 20.0% | 0.200 |
| 2026-06-10 02:13:08 | hybrid | 5 | 40.0% | 60.0% | 60.0% | 0.500 |
| 2026-06-10 02:12:59 | lexical | 5 | 40.0% | 40.0% | 40.0% | 0.400 |
| 2026-06-02 01:46:09 | lexical | 20 | 75.0% | 75.0% | 80.0% | 0.756 |
| 2026-06-02 01:43:25 | lexical | 1 | 0.0% | 0.0% | 0.0% | 0.000 |
| 2026-05-29 20:51:19 | lexical | 150 | 52.7% | 54.0% | 54.7% | 0.534 |
About these metrics
- Hit@1 / Hit@5 / Hit@10 — fraction of queries where the labeled-correct chunk appears in the top N results.
- MRR (Mean Reciprocal Rank) — average of 1/rank across queries. 1.0 means the right chunk is always first.
- Regression alert — triggered when MRR drops by more than 0.05 versus the previous run.