home
library →
builder

Eval harness

//
Design an eval harnessleaf
Production-ready eval: golden set construction, metrics, regression gates.
RLHF / DPO preference data eval designleaf
Design the preference data and eval that decides whether a preference-tuning run actually improved the model.