# Evaluation Rubric

Score each case on 0-100 via weighted criteria:

- Expected content coverage: +weight
- Forbidden content violations: -weight
- Regex/format compliance: +weight
- Output length sanity: +/-weight

Recommended acceptance gates:

- Average score >= 85
- No case below 70
- Zero critical forbidden-content hits
