RAG/LLM evaluation skill — golden sets, LLM-as-judge, scorecards.
Evaluate a RAG/LLM app: golden set from your docs + LLM-as-judge + retrieval metrics + shareable scorecard + CI gate.