📦

AI Evaluation Framework

skillsmp•testing•👍 820 upvotes

Methodologies for evaluating AI agent performance with rubrics, benchmarks, and automated testing.

evaluationbenchmarkstestingmetrics

📦 Installation

Copy evaluation.md to .claude/skills/

📖 About This Skill

Methodologies for evaluating AI agent performance with rubrics, benchmarks, and automated testing.

Features

Works with claude-code, codex
Part of the skillsmp skills collection
Category: testing

Getting Started

After installing the skill, simply mention it in your conversation or use the relevant command. The AI will automatically apply the skill's instructions to help you complete your task.

Quick Actions

View on GitHub

Supported Platforms

💻claude code

⚡codex

Source

📦

skillsmp

Skills Collection