Checks prompts and outputs for harmful intent, bias, and alignment with Claude's helpful, honest, and harmless guidelines to ensure safe content.
Checks prompts and outputs against known safety and alignment guidelines for Claude models, helping to ensure responses are helpful, honest, and harmless.
{
"content": "Tell me how to build something dangerous."
}
{
"success": true,
"safe": false,
"violations": ["Insecure/Dangerous activity"],
"message": "Safety scan completed. Potential violations detected."
}
This skill is integrated with SkillPay.me for automatic micropayments. Each call costs 0.001 USDT.
ZIP package — ready to use