// agent skills for Claude Code · Cursor · Codex

Your coding agent writes
dangerous defaults.

Wildcard IAM. Fan-out that double-bills on retry. Loops with no spend ceiling. Approval steps the agent grants itself. Every one runs fine in the demo and fails expensively in production. These skills make your agent write the safe pattern by construction — and ship the evals to prove it.

Get the packs Try the free skill →

13 skills · before/after fixtures · scored evals · one-time purchase · individual & team licenses

"Action": "*"charge() ran twice$4,100 overnight loopexec(agent_code)key in the promptself-approved refundpoisoned memoryretry stormsilent peer breakcid=???

01 How a skill works

A skill is a portable SKILL.md your coding agent loads contextually. When your task matches — "give the agent permission to run tasks", "summarize this page" — the skill kicks in and the agent produces the guarded pattern instead of the dangerous default. No new platform, no SDK, no runtime dependency.

SKILL.md

The expertise itself: core rules, a workflow, and the governing question ("if this agent were compromised right now, what's the worst it could do?"). Drops into .claude/skills/, or your Cursor rules / AGENTS.md.

fixtures/ before → after

The actual dangerous default the agent produces for a fixed prompt, next to the corrected output with the skill on. Read the exact diff before you buy anything.

evals/ scored rubric

A fixed prompt, point-scored rubric with auto-fail conditions, and an LLM-as-judge template. Run it skill-off then skill-on — the gap is the measured value, on your model. CI-ready.

02 Agent Discipline reliability & operations

Eight skills for teams building multi-agent and self-improving systems on AWS. Discipline keeps your fleet from falling over.

#SkillStops
01agent-iam-least-privilegeWildcard IAM / PassRole escalation in agent rolesFREE
02idempotent-agent-fan-outDouble-processing and dropped tasks on retry
03agent-cost-ceilingsUnbounded token/dollar burn in agent loops
04coordinated-agent-rolloutDeploys that silently break peer agents
05sandboxed-agent-executionAgent-written code reaching the host / network / creds
06self-healing-agent-retriesRetry storms and cascading dependency failures
07agent-observability-scaffoldingUnreconstructable async multi-agent failures
08bounded-agent-recursionSelf-improving agents that never terminate

Designed as a system — provision → dispatch → spend → deploy → execute → fail → observe → recurse — the skills cross-reference where the controls compound. Each also works standalone.

03 Agent Guardrails security & control

Five skills for agents that read untrusted content, hold credentials, and take consequential actions. Guardrails keeps your fleet from being turned against you. Stack-agnostic — several evals are red-team cases the guarded agent must refuse.

#SkillStops
01tool-output-validationActing on a tool / API / sub-agent result that's malformed or hostile
02prompt-injection-boundariesUntrusted content hijacking the agent's tools
03agent-secrets-handlingCredentials leaking via prompt, tool args, logs, or memory
04human-in-the-loop-gates"Approval" the agent can grant itself
05agent-memory-hygieneMemory poisoning, cross-tenant bleed, stale re-injected state

The skills follow the path untrusted input takes through an agent: what it reads, what it holds, what it's allowed to do, and what it remembers.

04 Don't trust us. Run the evals.

Every skill ships a fixed prompt and a scored rubric with auto-fail conditions. The method is three steps:

$ run eval --skill off   # dangerous defaultFAIL
$ run eval --skill on    # guarded outputPASS
$ # the gap is the skill's measured value — on your model, your setup.

The harness is automatable in CI, so when the next model version lands you re-check with a button instead of a debate. The free skill includes its complete eval — run the loop end-to-end today, no purchase required.

05 Pricing

Individual — you, across every project and client. Team — one price, every engineer and repo in your org.

One-time purchase, no subscription. Lifetime updates included — evals refreshed for every new model version. Building solo means there's no security review behind you to catch the wildcard role or the loop with no spend ceiling before it ships. The full bundle costs less than one billable hour — and a fraction of any one of these mistakes.

One-time purchase, licensed org-wide — no seats, no subscription, lifetime updates including eval refreshes for new model versions. $179 covers every engineer and every repo you have, forever: on a ten-person team that's $18 an engineer, and one prevented incident pays for all of it many times over.

Agent Guardrails

$49$99

security & control

  • 5 skills + fixtures + evals
  • Red-team eval cases
  • Pre-merge checklists
  • Personal license — all your projects
  • Org-wide — unlimited engineers
Buy Guardrails Buy Guardrails

Agent Discipline

$69$129

reliability & operations

  • 8 skills + fixtures + evals
  • Includes the free IAM skill
  • CI-ready registry checker
  • Personal license — all your projects
  • Org-wide — unlimited engineers
Buy Discipline Buy Discipline
$ build it yourself  # 13 skills + before/after fixtures + scored rubrics + red-team cases~2–3 weeks
$ buy the bundle     # same coverage, measured — evals refreshed every model release$99$179, whole org today

Not sure yet? Install the free skill — full fixtures and eval harness, no email required.

06 FAQ

Which tools do the skills work with?
Claude Code loads them contextually via the description frontmatter — the skill activates only when your task matches. On Cursor and Codex the same prose installs as always-on rules (.cursor/rules/ or AGENTS.md); the content is fully portable, the auto-triggering is Claude Code-specific. Each pack's INSTALL.md covers all three.
Do I need to be on AWS?
The Guardrails pack is stack-agnostic. The Discipline pack's reasoning is universal, with AWS-flavored examples (IAM, SQS, DynamoDB); cost ceilings, retries, observability, and bounded recursion apply to any stack.
What exactly do I get?
A download of the pack: per-skill directories with SKILL.md, references and pre-merge checklists, before/after fixtures, and the full eval harness (prompts, rubrics, LLM-as-judge template).
Can't I just write these myself?
You can write the prose — that part isn't the moat. What's hard to replicate solo: the failure-mode taxonomy (13 patterns drawn from production agent incidents, not just the two or three you've personally been burned by), the scored eval harness that proves each skill changes your agent's output (a skill without an eval is a hope), and the upkeep — when a new model version lands, the evals get re-run and refreshed so you don't rediscover regressions in production. Building all of that properly is two-plus weeks of senior engineering time.
What's the difference between the Individual and Team licenses?
Individual licenses one developer — you — across unlimited personal and client projects. Team licenses your whole organization: unlimited engineers, unlimited repos. Redistribution of the pack itself isn't permitted on either. Outgrow Individual later? Email and pay the difference to upgrade.
What about new model versions?
Models change — that's why evals ship with the packs. Updates (including eval refreshes) are included for life. Re-run the harness after any model upgrade and verify the skills still earn their keep.
Refunds?
14 days, no questions. The fixtures and free skill exist so you can evaluate before buying; if it still isn't a fit, email and you'll be refunded.