agentsllmskillsarchitectureproductionclaude-codecodexagents-mdopen-standard

AI Agent Skills: What They Are and How They Work

Luc K. SEGBEDZI· May 26, 2026 · 18 min readagents

AI Agent Skills: What They Are and How They Work

LLMs know facts. What they lack is procedural knowledge — the specific, ordered, reasoned steps that describe how work actually gets done in a real system. A release workflow. An incident triage protocol. The exact sequence to deploy a smart contract to mainnet.

OpenAI measured the impact on their own SDK repos: by using skills to automate verification, release review, and PR handoff, development throughput increased +44% in 3 months — 457 PRs merged versus 316, same team, same codebase.

Agent skills are the solution the industry has converged on. On December 18, 2025, Anthropic launched the SKILL.md format as an open standard. Two months later, OpenAI adopted it in Codex and ChatGPT. As of May 2026: 26+ platforms — Claude Code, OpenAI Codex, GitHub Copilot, Gemini CLI, Cursor, VS Code — all reading the same format.

This article covers the architecture, compares Anthropic and OpenAI implementations, and provides production patterns. Primary sources are listed — read them before the summaries.

The Number That Justifies Everything

OpenAI maintains their Agents SDK repos (Python + TypeScript) using Codex with repo-local skills. Documented results between December 2025 and February 2026:

Metric	Before (Sept–Nov 2025)	After (Dec 2025–Feb 2026)	Delta
PRs merged — Python	182	226	+24%
PRs merged — TypeScript	134	231	+72%
Total	316	457	+44%

Same team. Same codebase. Skills added for: automated verification, release review, PR draft generation.

Source: developers.openai.com/blog/skills-agents-sdk

Official Sources

Platform	Official Documentation
Agent Skills Standard	agentskills.io — Apache 2.0
Full Specification	agentskills.io/specification
Claude / Claude Code	docs.claude.com → agents-and-tools → agent-skills
Claude Code CLI	code.claude.com → docs → skills
OpenAI Codex skills	developers.openai.com/codex/skills
OpenAI blog — real case	developers.openai.com/blog/skills-agents-sdk
OpenAI skills repo	github.com/openai/skills
Official validator	github.com/agentskills/agentskills → skills-ref

Read these first. This article adds what the docs don't have: production tradeoffs, architecture decisions, and the differences that actually matter between platforms.

What a Skill Is

A skill is a directory containing a SKILL.md file at its root.

bash

my-skill/
├── SKILL.md         # Required — instructions + metadata
├── scripts/         # Optional — executable JS, Python, or bash
├── references/      # Optional — supplementary docs, loaded on demand
└── assets/          # Optional — templates, data files

The SKILL.md frontmatter has exactly two required fields. Constraints from the official spec:

Field	Required	Rules
`name`	Yes	Max 64 chars · lowercase letters, numbers, hyphens only · must match the parent directory name
`description`	Yes	Max 1024 chars · describe what to do AND when to use it
`license`	No	License name or reference to a bundled file
`compatibility`	No	Max 500 chars · system requirements (git, docker, etc.)
`metadata`	No	Key-value map — this is where version goes, not at the root
`allowed-tools`	No	Experimental — space-separated pre-approved tools

Minimal valid example:

yaml

---
name: pr-summary-for-stakeholders
description: >
  Use when asked to summarize a GitHub pull request for a non-technical
  audience, product manager, or executive. Translates technical changes
  into business impact. Do not use for technical code reviews.
---

Critical rule on name: the name must exactly match the parent directory name. A skill at pr-summary-for-stakeholders/SKILL.md must have name: pr-summary-for-stakeholders. Uppercase letters and consecutive hyphens (--) are invalid and will fail validation.

To version a skill, use metadata — there is no version field at the frontmatter root in the official spec:

yaml

---
name: financial-report-generator
description: >
  Generates a compliant financial report in IFRS, GAAP, or OHADA format.
  Use when the user requests a financial report or a regulated compliance
  document. Do not use for exploratory analysis.
license: Apache-2.0
metadata:
  version: "1.2.0"
  changelog: "1.2.0 - OHADA (TG/CI/SN) · 1.1.0 - cache fallback · 1.0.0 - IFRS+GAAP"
  author: "Luc K. SEGBEDZI"
---

The body is raw markdown. Step-by-step instructions, conditional logic, error handling.

markdown

## Step 1 — Fetch PR data
 
Run: `node scripts/fetch-pr-diff.js --pr $PR_URL --output /tmp/pr-data.json`
 
If script exits with code 401: GITHUB_TOKEN missing. Stop.
If script exits with code 404: incorrect PR URL. Ask the user to verify.
 
## Step 2 — Write the summary
 
Structure:
**What changed:** [1-2 sentences, no technical jargon]
**Why it matters:** [Business impact]
**Risk level:** Low / Medium / High — [one-sentence justification]
**Monitor after deploy:** [Specific metrics, or "None"]
 
Do not include: file names, function names, line numbers.

Validate your skill before using it in production:

bash

# Install the official validator
npm install -g skills-ref
 
# Validate
skills-ref validate ./pr-summary-for-stakeholders

This is procedural memory made explicit, versioned, and executable.

Progressive Disclosure: How 100 Skills Don't Saturate the Context Window

The standard solves the context window problem with progressive disclosure. From the official agentskills.io spec:

bash

Level 1 — Startup (~100 tokens per skill)
  name + description only
  → Lightweight index, even with 100 skills installed
 
Level 2 — On activation (<5000 tokens recommended)
  Full SKILL.md body — loaded when description matches
  → Practical limit: 500 lines maximum per SKILL.md
  → Beyond that: move content to references/
 
Level 3 — On demand (cost on use)
  scripts/    — executed when a step calls them
  references/ — loaded when a step references them
  assets/     — templates and data files

Matching is done by the LLM itself, not keyword search. The description is the only thing the agent reads to decide whether to activate a skill. That is why it is the most important part — not the instructions.

Practical implications for production:

A dense 800-line SKILL.md → split into references/
Happy path in the body, edge cases in references/edge-cases.md
Heavy scripts in scripts/ — never inline them in the body

How Anthropic Implements Skills: Two Distinct Surfaces

1. Claude Code CLI Skills

Claude Code loads skills in this priority order:

Scope	Path
User global	`~/.claude/skills/`
Project	`.claude/skills/` (relative to repo root)

The format follows the open standard exactly. Claude Code-specific extensions:

Invocation control: configurable via Claude Code settings — allow_implicit_invocation: false requires explicit invocation via /skill-name. This parameter is in the Claude Code configuration, not in the SKILL.md frontmatter
Bundled skills: /claude-api, /code-review, /batch, /debug, /loop
Sub-agents: skills can spawn sub-agents for parallel work

2. Pre-built Skills via the Claude API

A separate surface: skills hosted by Anthropic, activated via the container parameter in the Messages API. Available skills: pptx, xlsx, docx, pdf.

python

import anthropic
 
client = anthropic.Anthropic()
 
response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    betas=["code-execution-2025-08-25", "skills-2025-10-02"],
    container={
        "skills": [{"type": "anthropic", "skill_id": "xlsx", "version": "latest"}]
    },
    messages=[{
        "role": "user",
        "content": "Create an Excel file with a quarterly budget"
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)

⚠️ Both beta headers required: code-execution-2025-08-25 AND skills-2025-10-02. Missing either one causes a silent failure with no clear error message.

List available skills:

bash

curl "https://api.anthropic.com/v1/skills?source=anthropic" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: skills-2025-10-02"

Note: Beta headers contain a date in their name and may change. Verify the current header at platform.claude.com/docs/en/build-with-claude/skills-guide at integration time.

How OpenAI Codex Implements Skills

Official Paths

Scope	Path	Use Case
User global	`~/.codex/skills/`	Personal skills across all projects
Project	`.codex/skills/`	Team skills versioned in the repo
Cross-platform standard	`~/.agents/skills/`	agentskills.io standard path — also read by Codex, Gemini CLI, and others

bash

~/.codex/skills/
├── code-reviewer/
│   └── SKILL.md
├── git-commit-writer/
│   └── SKILL.md
└── env-doctor/
    ├── SKILL.md
    └── references/
        └── common-issues.md

Which path to use? — ~/.codex/skills/ for Codex-first workflows — ~/.agents/skills/ for a single folder shared across Claude Code + Codex + Gemini CLI — .codex/skills/ inside the repo for team skills versioned in git

Full Path Comparison by Platform

Agent	User skills	Project skills
Claude Code	`~/.claude/skills/`	`.claude/skills/`
Codex CLI	`~/.codex/skills/`	`.codex/skills/`
Gemini CLI / Open standard	`~/.agents/skills/`	`.agents/skills/`
Cursor	N/A	`.cursor/skills/`

VS Code / GitHub Copilot: see code.visualstudio.com/docs/copilot/customization/agent-skills — structure varies by version.

The SKILL.md file is identical across all these platforms. Only the installation path changes.

`openai.yaml` — Codex-only Extension

Codex supports an optional openai.yaml file inside the skill folder. Other agents ignore it.

bash

~/.codex/skills/my-skill/
├── SKILL.md          ← Open standard — portable everywhere
└── openai.yaml       ← Codex only — ignored by Claude Code and Gemini CLI

yaml

# openai.yaml — Codex-specific metadata
policy:
  allow_implicit_invocation: false   # false = explicit $skill-name invocation required
 
dependencies:
  tools:
    - type: "mcp"
      value: "github"
      description: "GitHub MCP server"
      transport: "streamable_http"
      url: "https://api.github.com/mcp"

Installation

bash

# Official OpenAI skills
$skill-installer install https://github.com/openai/skills/tree/main/skills/.experimental/<skill-name>
 
# Community skills — manual install
mkdir -p ~/.codex/skills
git clone <repo-url> ~/.codex/skills/<skill-name>
 
# Verify installation
ls ~/.codex/skills/<skill-name>/SKILL.md

AGENTS.md + Skills: OpenAI's Production Pattern

A critical pattern that introduction articles miss: skills alone are not enough in production. They become powerful when combined with AGENTS.md.

AGENTS.md is a file at the root of a repo. It tells Codex which rules to always follow before starting any work. This is where you make skills mandatory.

markdown

# AGENTS.md
 
## Mandatory skill usage
 
- Use `$implementation-strategy` before editing runtime or API changes.
- Run `$code-change-verification` when runtime code, tests, or build behavior changes.
- Use `$openai-knowledge` for OpenAI API or platform work.
- Use `$pr-draft-summary` when work is ready for review.
 
## Build and test commands
 
- Python: `make format`, `make lint`, `make typecheck`, `make tests`
- TypeScript: `pnpm i`, `pnpm build`, `pnpm lint`, `pnpm test`
 
## Compatibility rules
 
- Preserve positional compatibility for public constructors and dataclass fields.

Skills that OpenAI actually uses in production in their SDK repos (public, verifiable on GitHub):

Skill	Role
`code-change-verification`	Runs the full verification stack (format, lint, typecheck, tests)
`docs-sync`	Audits docs against the codebase — finds gaps and stale content
`examples-auto-run`	Runs examples in auto mode with structured logs
`final-release-review`	Compares previous tag vs current RC — GREEN / BLOCKED decision
`implementation-strategy`	Decides approach and compatibility boundary before touching code
`openai-knowledge`	Pulls OpenAI docs via Docs MCP — prevents hallucination
`pr-draft-summary`	Generates PR title + description at handoff time
`test-coverage-improver`	Identifies coverage gaps and proposes high-value tests

The model is clear:

AGENTS.md defines when to call a skill
The skill defines how to do the work
The separation is intentional — one without the other is incomplete

Documented result on both OpenAI Agents SDK repos: +44% throughput in 3 months.

The Ecosystem: 26+ Platforms, One Format

As of May 2026, the SKILL.md standard is implemented by 26+ platforms:

Claude Code · OpenAI Codex · GitHub Copilot · VS Code · Gemini CLI · Cursor · Amp · Junie (JetBrains) · OpenHands · Goose · Letta · Firebender · OpenCode · OpenClaw · Autohand · Mux · Piebald · and more...

The marketplace ecosystem has emerged. Platforms like Agensi.io distribute security-scanned skills before publication. The distinction from raw GitHub skills matters in production.

Skills vs RAG vs MCP vs Fine-Tuning

These four approaches handle different types of knowledge. They are not interchangeable.

Approach	What it gives the agent	What it doesn't give
Skills	Procedural knowledge — how to do tasks, in what order	Factual lookup, external API access
MCP	Tool access — calling external APIs and services	When to call them, how to handle results
RAG	Factual knowledge — relevant chunks from a knowledge base	How to do things, procedural judgment
Fine-tuning	Permanent weight changes — baked-in knowledge	Flexibility; expensive to redo when procedures change

The cognitive science framework is precise:

Semantic memory (facts about the world) → RAG, knowledge bases
Episodic memory (past experiences) → context windows, memory systems
Procedural memory (how to do things) → skill files

In production, these complement each other. A skill provides the judgment for when and how to use an MCP tool. RAG provides the reference material a skill may need during execution.

Real OpenAI example: the openai-knowledge skill is a wrapper around OpenAI's Docs MCP. The skill says when to call the MCP and what to do with it. MCP provides access. Skill provides judgment.

The Security Constraint Nobody Mentions

A skill with scripts has access to the filesystem, environment variables, and API keys present in the shell. That is what makes it powerful. That is also its attack surface.

The three vectors to audit before installing any third-party skill:

1. Prompt injection in SKILL.md The SKILL.md body is loaded directly into the agent's context. An attacker can inject instructions that hijack behavior — data exfiltration, bypassing system rules.

2. Tool poisoning in scripts A scripts/setup.sh can do anything: exfiltrate ~/.ssh/, call an external URL, silently modify a config file. The agent runs it on instruction from the SKILL.md — without asking for confirmation.

3. Credential harvesting in assets An assets/config.json that asks for an API key "for testing" and sends it to a remote endpoint.

The distinction that matters in production:

Source	Security scan	Recommendation
Agensi.io	✅ Before publication	OK with a quick audit
`github.com/openai/skills` official	✅ Maintained by OpenAI	Trustworthy
Third-party / community GitHub	❌ None	Read every file before executing

bash

# Audit checklist — non-negotiable before production install
 
# 1. Read SKILL.md in full — look for out-of-scope instructions
cat ~/.codex/skills/<skill-name>/SKILL.md
 
# 2. Read every script — look for network calls and filesystem access
cat ~/.codex/skills/<skill-name>/scripts/*.sh
cat ~/.codex/skills/<skill-name>/scripts/*.py
 
# 3. Search for hardcoded URLs across all files
grep -r "http" ~/.codex/skills/<skill-name>/
 
# 4. Search for credential and sensitive variable access
grep -r "API_KEY\|TOKEN\|SECRET\|PASSWORD\|~/.ssh\|/etc/passwd" \
  ~/.codex/skills/<skill-name>/
 
# 5. Test in a sandbox repo before production
# 6. Pin to a specific commit SHA — never "latest" from an unaudited source

Treat skill installation like installing an npm package with scripts.postinstall: you read before you run.

What Makes a Production-Grade Skill

Three things consistently determine whether a skill works reliably in production. These are the patterns OpenAI documented from their own repos.

1. The Description Is the Most Critical Line

The description is routing metadata, not a summary. It must state: when the skill applies, what type of changes trigger it, and explicit exclusions.

yaml

# Too vague — fires on too many things
description: Run the mandatory verification stack.
 
# Production-grade — says when AND on what (OpenAI Agents SDK pattern)
description: >
  Run the mandatory verification stack when changes affect runtime code,
  tests, or build/test behavior. Do not trigger for docs-only changes.

OpenAI's lesson: if routing is unreliable, fix the description before adding more code to the body.

2. Model vs Scripts: The Right Split

What belongs in the model	What belongs in scripts
Interpretation, comparison, reporting	Deterministic, repeated shell work
Decisions requiring context	Fixed command sequences
Explaining results	Parsing, validation, formatting

Scripts behave like mini-CLIs: deterministic stdout output, explicit error codes, outputs to known file paths.

markdown

## Step 2 — Verification
 
Run: `scripts/verify.sh --mode strict --output /tmp/report.json`
 
If exit code 0: proceed to Step 3
If exit code 1: read /tmp/report.json, report errors, stop
If exit code 2: timeout — rerun with --mode fast, document the result

3. Failure Modes Must Be Explicit

markdown

## Error handling
 
If validation script exits with code > 0:
- Do NOT proceed to the next step
- Return errors in this exact format:
  Validation failed. [N] critical errors:
  - [Error] in [location]
  Fix these errors before continuing.
 
If external API returns 503:
- Use the cached fallback in assets/fallback-rules.json
- Add a visible warning: "Cached data from [date]. Verify before submission."

4. Version Your Skills Like Code

yaml

---
name: financial-report-generator
metadata:
  version: "1.2.0"
  changelog: "1.2.0 - OHADA (TG/CI/SN) · 1.1.0 - cache fallback · 1.0.0 - IFRS+GAAP"
---

A skill that silently changes in production is a silent behavior change in your agent. Version like code, revert like code.

Complete Example: Release Review Skill

Complete skill inspired by final-release-review from the OpenAI Agents SDK repos — public and verifiable on GitHub.

Structure:

bash

release-review/
├── SKILL.md
├── scripts/
│   └── fetch-diff.sh
└── references/
    └── release-criteria.md

SKILL.md

yaml

---
name: release-review
license: Apache-2.0
compatibility: Requires git
metadata:
  version: "1.0.0"
  author: "Luc K. SEGBEDZI"
  repo: "github.com/Komluc/agent-skills-production"
description: >
  Prepares a complete release review by comparing the previous git tag with
  main. Use when work is complete and ready for a release candidate.
  Do not use for intermediate reviews during active development.
---
 
## Step 1 — Fetch the diff
 
Run: `scripts/fetch-diff.sh --output /tmp/release-diff.txt`
 
Exit codes:
- exit 0: diff generated — continue to Step 2
- exit 1: no previous git tag found — ask the user to create a tag first
- exit 2: git not available in PATH — stop and inform the user
- exit 3: missing argument — skill configuration error
 
## Step 2 — Analyze the diff
 
Read `/tmp/release-diff.txt`. Inspect for:
- Breaking changes in public APIs
- Behavioral regressions
- Missing migration notes for breaking changes
- Feature removals without prior deprecation
 
See `references/release-criteria.md` for complete criteria by change type.
 
## Step 3 — Release decision
 
Start from "safe to release". Move to "blocked" only on concrete evidence in the diff.
 
Required output format:
 
Release readiness review
 
Release call: GREEN — safe to release
OR
Release call: BLOCKED
 
[If BLOCKED only]
Unblock checklist:
- [ ] [Specific action — cite the exact diff line as evidence]
 
Every blocked item must point to evidence in the diff. A BLOCKED call without evidence is invalid.

scripts/fetch-diff.sh

bash

#!/usr/bin/env bash
# fetch-diff.sh
# Usage: fetch-diff.sh --output <path>
#
# Exit codes:
#   0 — success, diff written to <path>
#   1 — no previous git tag found
#   2 — git not available in PATH
#   3 — missing --output argument
 
set -euo pipefail
 
# ── Check git availability ───────────────────────────────────────────────────
if ! command -v git &>/dev/null; then
  echo "ERROR: git not found in PATH" >&2
  exit 2
fi
 
# ── Parse arguments ──────────────────────────────────────────────────────────
OUTPUT=""
while [[ $# -gt 0 ]]; do
  case "$1" in
    --output)
      OUTPUT="$2"
      shift 2
      ;;
    *)
      echo "ERROR: unknown argument: $1" >&2
      echo "Usage: $0 --output <path>" >&2
      exit 3
      ;;
  esac
done
 
if [[ -z "$OUTPUT" ]]; then
  echo "ERROR: --output is required" >&2
  echo "Usage: $0 --output <path>" >&2
  exit 3
fi
 
# ── Find previous tag ────────────────────────────────────────────────────────
PREV_TAG=$(git describe --tags --abbrev=0 HEAD~1 2>/dev/null || echo "")
 
if [[ -z "$PREV_TAG" ]]; then
  echo "ERROR: no previous git tag found." >&2
  echo "Create a tag first: git tag v0.0.0 <commit-sha>" >&2
  exit 1
fi
 
# ── Generate diff ────────────────────────────────────────────────────────────
echo "Comparing: $PREV_TAG → HEAD"
git diff "$PREV_TAG"..HEAD --stat --no-color >  "$OUTPUT"
git diff "$PREV_TAG"..HEAD --no-color        >> "$OUTPUT"
 
echo "Diff written to: $OUTPUT"
echo "Lines: $(wc -l < "$OUTPUT")"

references/release-criteria.md

markdown

# Release Criteria by Change Type
 
## Public APIs
- Removing an existing parameter → BLOCKED (breaking)
- Changing a return type → BLOCKED (breaking)
- Adding an optional parameter → OK
- Deprecation with notice → OK if migration note is present
 
## Runtime compatibility
- Dropping support for a version (e.g. Python 3.9) → MODERATE
  Verify release notes explicitly mention it
- Major dependency change → MODERATE
 
## General rule
Start from GREEN. Every move to BLOCKED must cite a line from the diff.
A BLOCKED call without evidence in the diff is invalid.

Validation before use

bash

# Install the official validator
npm install -g skills-ref
 
# Validate structure and frontmatter
skills-ref validate ./release-review
 
# Make script executable
chmod +x release-review/scripts/fetch-diff.sh
 
# Dry run in a repo with at least one tag
cd your-test-repo
git tag v0.0.0 HEAD~5   # create a tag if needed
bash ../release-review/scripts/fetch-diff.sh --output /tmp/test-diff.txt
cat /tmp/test-diff.txt

Official Resources

Agent Skills Standard: agentskills.io — Apache 2.0 spec
Full Specification: agentskills.io/specification
Claude Agent Skills overview: docs.claude.com
Claude Code skills guide: code.claude.com
Skills in the Claude API: platform.claude.com/docs/en/build-with-claude/skills-guide
OpenAI — Skills in production (real case): developers.openai.com/blog/skills-agents-sdk
OpenAI Codex skills: developers.openai.com/codex/skills
OpenAI skills catalog: github.com/openai/skills
Codex CLI paths: agensi.io/learn/where-are-codex-cli-skills-stored
Companion repo: github.com/Komluc/agent-skills-production

Luc K. SEGBEDZI — AI Systems Engineer & Blockchain Architect Founder, MGS (MA Group Solutions) · Lomé, Togo Builder: every article has a GitHub repo

AI Agent Skills: What They Are and How They Work

The Number That Justifies Everything

Official Sources

What a Skill Is

Progressive Disclosure: How 100 Skills Don't Saturate the Context Window

How Anthropic Implements Skills: Two Distinct Surfaces

1. Claude Code CLI Skills

2. Pre-built Skills via the Claude API

How OpenAI Codex Implements Skills

Official Paths

Full Path Comparison by Platform

openai.yaml — Codex-only Extension

Installation

AGENTS.md + Skills: OpenAI's Production Pattern

The Ecosystem: 26+ Platforms, One Format

Skills vs RAG vs MCP vs Fine-Tuning

The Security Constraint Nobody Mentions

What Makes a Production-Grade Skill

1. The Description Is the Most Critical Line

2. Model vs Scripts: The Right Split

3. Failure Modes Must Be Explicit

4. Version Your Skills Like Code

Complete Example: Release Review Skill

SKILL.md

scripts/fetch-diff.sh

references/release-criteria.md

Validation before use

Official Resources

`openai.yaml` — Codex-only Extension