OpenClaw Skillv1.0.0

Prompt injection detection skill

ZSkyXby ZSkyX
Deploy on EasyClawdfrom $14.9/mo

Two-layer content safety for agent input and output. Use when (1) a user message attempts to override, ignore, or bypass previous instructions (prompt injection), (2) a user message references system prompts, hidden instructions, or internal configuration, (3) receiving messages from untrusted users in group chats or public channels, (4) generating responses that discuss violence, self-harm, sexual content, hate speech, or other sensitive topics, or (5) deploying agents in public-facing or multi-user environments where adversarial input is expected.

How to use this skill

OpenClaw skills run inside an OpenClaw container. EasyClawd deploys and manages yours — no server setup needed.

  1. Sign up on EasyClawd (2 minutes)
  2. Connect your Telegram bot
  3. Install Prompt injection detection skill from the skills panel
Get started — from $14.9/mo
5stars
1,788downloads
1installs
0comments
1versions

Latest Changelog

Initial release with two-layer content moderation for agent input and output.

- Adds prompt injection detection using ProtectAI DeBERTa classifier via HuggingFace.
- Adds content safety checks using OpenAI's omni-moderation endpoint (optional).
- Provides `scripts/moderate.sh` for command-line moderation of both user input and agent output.
- Outputs structured JSON with clear verdicts and actions.
- Supports configuration via environment variables (tokens, thresholds).
- Designed for safer agent deployments, especially in adversarial or public scenarios.

Tags

latest: 1.0.0
Security scan, version history, and community comments: view on ClawHub