LLM benchmarking and evaluation including lm-evaluation-harness, BigCode Evaluation Harness, and NeMo Evaluator. Use when benchmarking models or measuring performance.
Add the marketplace
/plugin marketplace add tianhao909/AI-Research-SKILLs-cn
Install plugins
/plugin
Run these commands in Claude Code to add this plugin to your environment. The marketplace must be added before you can install its plugins.
From Plugin
evaluation
View Plugin
From Marketplace
ai-research-skills
View Marketplace
Author
@tianhao909
View GitHub Profile