home
library →
search
reset
builder
Eval harness
start
/
AI / ML
/
Eval harness
Design an eval harness
leaf
Production-ready eval: golden set construction, metrics, regression gates.
RLHF / DPO preference data eval design
leaf
Design the preference data and eval that decides whether a preference-tuning run actually improved the model.
skip — show all leaves under eval harness →
back