Agent Input Firewall build log

A Codex and Claude Code skill for quarantining untrusted external text before coding agents act.

Sunday, June 28, 2026Omid Saffari

Agent Input Firewall is a Codex and Claude Code skill for one job: quarantine untrusted GitHub issues, PR comments, web excerpts, feedback, and logs before a coding agent turns them into commands, commits, or code changes.

Public issue threads and pasted logs are useful context, but they are not trusted instructions. This skill forces an agent to convert that external text into a safe execution brief first: source-tagged evidence, actionable tasks, context-only claims, blocked instructions, open questions, and one bounded next step.

The release is intentionally pure instructions. There is no server, account, runtime key, or hosted dependency.

Why this release

Agent workflows increasingly depend on public comments, copied logs, browser snippets, and issue reports. Those inputs can contain prompt-injection text, credential requests, fake maintainer instructions, or scope expansion. The practical fix is not a broad security dashboard. It is a small skill that makes the agent stop, mark the trust boundary, and separate evidence from instructions before it acts.

Agent Input Firewall rides two current trends: file-based agent skills as reusable workflow units, and prompt-injection-safe handling of untrusted context inside agentic coding loops.

What shipped

  • Repo: https://github.com/dvnc-labs/agent-input-firewall
  • Release: https://github.com/dvnc-labs/agent-input-firewall/releases/tag/v0.1.0
  • Demo surface: https://github.com/dvnc-labs/agent-input-firewall#readme
  • Artifact type: Codex / Claude Code skill.
  • Scaffold: omidsaffari/skill-starter.
  • Skill entrypoint: SKILL.md.
  • Reference file: references/threat-patterns.md.
  • Release assets: assets/demo.gif and assets/og.png.
  • Reproducible renderer: scripts/render-assets.mjs.
  • Codex UI metadata: agents/openai.yaml.

What the skill does

The skill tells the agent to:

  1. mark the source boundary;
  2. build an evidence ledger;
  3. strip data-borne instructions;
  4. ground actionable work against trusted repo context;
  5. produce a safe execution brief before implementation;
  6. continue only from grounded tasks if asked to implement.

It blocks external instructions that ask for secrets, shell or network actions, CI changes, release actions, permission changes, task expansion, or instruction override.

Gates

  • Remote repo created from the private omidsaffari/skill-starter template.
  • SKILL.md frontmatter gate passed.
  • README/SKILL placeholder gate passed.
  • Release-copy placeholder scan passed, excluding frozen scaffold docs and CI detector.
  • Required release files present: README.md, LICENSE, CHANGELOG.md, assets/demo.gif, assets/og.png, references/threat-patterns.md, scripts/render-assets.mjs, and agents/openai.yaml.
  • Asset renderer syntax and execution passed.
  • assets/og.png is 1280 x 640 and under 1 MB.
  • assets/demo.gif is 960 x 540, 8 seconds.
  • Install-copy checks passed for both .agents/skills and .claude/skills paths.
  • GitHub Actions CI on cdfa87a: passed.

Markdownlint is non-blocking in the starter workflow. The only remaining markdownlint warning is from the preserved scaffold AGENTS.md example text, not release-owned copy.

Distribution kit

Show HN title:

Show HN: Quarantine untrusted PR comments before coding agents act

Awesome-list targets:

  • https://github.com/hesreallyhim/awesome-claude-code
  • agent-safety and prompt-injection resource lists that accept small workflow tools.

Install snippets:

git clone https://github.com/dvnc-labs/agent-input-firewall ~/.claude/skills/agent-input-firewall

git clone https://github.com/dvnc-labs/agent-input-firewall ~/.agents/skills/agent-input-firewall

Last Updated

Jun 28, 2026

CategoryAgents

More from Agents

View all Agents articles
Newsletter

One letter, every week. Working systems — not hot takes.

Build logs, agentic engineering decisions, agent failures, evals, and what survives real users. Sent weekly, never more.

Weekly. No spam. Unsubscribe anytime.