// agent skills for Claude Code · Cursor · Codex
Wildcard IAM. Fan-out that double-bills on retry. Loops with no spend ceiling. Approval steps the agent grants itself. Every one runs fine in the demo and fails expensively in production. These skills make your agent write the safe pattern by construction — and ship the evals to prove it.
13 skills · before/after fixtures · scored evals · one-time purchase · individual & team licenses
A skill is a portable SKILL.md your coding agent loads contextually.
When your task matches — "give the agent permission to run tasks", "summarize this page" —
the skill kicks in and the agent produces the guarded pattern instead of the dangerous default.
No new platform, no SDK, no runtime dependency.
The expertise itself: core rules, a workflow, and the governing question
("if this agent were compromised right now, what's the worst it could do?").
Drops into .claude/skills/, or your Cursor rules / AGENTS.md.
The actual dangerous default the agent produces for a fixed prompt, next to the corrected output with the skill on. Read the exact diff before you buy anything.
A fixed prompt, point-scored rubric with auto-fail conditions, and an LLM-as-judge template. Run it skill-off then skill-on — the gap is the measured value, on your model. CI-ready.
Eight skills for teams building multi-agent and self-improving systems on AWS. Discipline keeps your fleet from falling over.
| # | Skill | Stops | |
|---|---|---|---|
| 01 | agent-iam-least-privilege | Wildcard IAM / PassRole escalation in agent roles | FREE |
| 02 | idempotent-agent-fan-out | Double-processing and dropped tasks on retry | |
| 03 | agent-cost-ceilings | Unbounded token/dollar burn in agent loops | |
| 04 | coordinated-agent-rollout | Deploys that silently break peer agents | |
| 05 | sandboxed-agent-execution | Agent-written code reaching the host / network / creds | |
| 06 | self-healing-agent-retries | Retry storms and cascading dependency failures | |
| 07 | agent-observability-scaffolding | Unreconstructable async multi-agent failures | |
| 08 | bounded-agent-recursion | Self-improving agents that never terminate |
Designed as a system — provision → dispatch → spend → deploy → execute → fail → observe → recurse — the skills cross-reference where the controls compound. Each also works standalone.
Five skills for agents that read untrusted content, hold credentials, and take consequential actions. Guardrails keeps your fleet from being turned against you. Stack-agnostic — several evals are red-team cases the guarded agent must refuse.
| # | Skill | Stops |
|---|---|---|
| 01 | tool-output-validation | Acting on a tool / API / sub-agent result that's malformed or hostile |
| 02 | prompt-injection-boundaries | Untrusted content hijacking the agent's tools |
| 03 | agent-secrets-handling | Credentials leaking via prompt, tool args, logs, or memory |
| 04 | human-in-the-loop-gates | "Approval" the agent can grant itself |
| 05 | agent-memory-hygiene | Memory poisoning, cross-tenant bleed, stale re-injected state |
The skills follow the path untrusted input takes through an agent: what it reads, what it holds, what it's allowed to do, and what it remembers.
Every skill ships a fixed prompt and a scored rubric with auto-fail conditions. The method is three steps:
$ run eval --skill off # dangerous default → FAIL
$ run eval --skill on # guarded output → PASS
$ # the gap is the skill's measured value — on your model, your setup.
The harness is automatable in CI, so when the next model version lands you re-check with a button instead of a debate. The free skill includes its complete eval — run the loop end-to-end today, no purchase required.
Individual — you, across every project and client. Team — one price, every engineer and repo in your org.
One-time purchase, no subscription. Lifetime updates included — evals refreshed for every new model version. Building solo means there's no security review behind you to catch the wildcard role or the loop with no spend ceiling before it ships. The full bundle costs less than one billable hour — and a fraction of any one of these mistakes.
One-time purchase, licensed org-wide — no seats, no subscription, lifetime updates including eval refreshes for new model versions. $179 covers every engineer and every repo you have, forever: on a ten-person team that's $18 an engineer, and one prevented incident pays for all of it many times over.
$49$99
security & control
best value
$99 $118$179 $228
discipline + guardrails
$69$129
reliability & operations
$ build it yourself # 13 skills + before/after fixtures + scored rubrics + red-team cases → ~2–3 weeks
$ buy the bundle # same coverage, measured — evals refreshed every model release → $99$179, whole org today
Not sure yet? Install the free skill — full fixtures and eval harness, no email required.
description frontmatter — the skill activates only when your task matches. On Cursor and Codex the same prose installs as always-on rules (.cursor/rules/ or AGENTS.md); the content is fully portable, the auto-triggering is Claude Code-specific. Each pack's INSTALL.md covers all three.SKILL.md, references and pre-merge checklists, before/after fixtures, and the full eval harness (prompts, rubrics, LLM-as-judge template).