Why AI Security Needs More Than Just One Scanner's Opinion

AI agents are getting a new layer of scrutiny with the establishment of agent skills. These skills come packed with reusable instructions, tools, and workflows that demand a security boundary distinct from traditional model safety and malware detection. Enter ClawHub Security Signals, a dataset that shines a light on how these skills are evaluated.

Disagreement Among Scanners

In a dataset featuring 67,453 public OpenClaw skill versions, three scanning families, VirusTotal, static heuristic analysis, and NVIDIA SkillSpector, showed significant disagreement. In fact, these scanners rarely agreed on what qualifies as suspicious or malicious. A mere 0.69% of skills were flagged by all three, and 81.9% got red-flagged by just one scanner. So, what gives?

The answer may lie in the focus of these scanners. SkillSpector, for instance, is more concerned with raising advisories about semantic and agentic risks rather than malware-reputation signals. It flagged a whopping 75.3% of suspicious rows but only 6.8% of clearly malicious ones. Meanwhile, VirusTotal was more consistent with bundled-code malware evidence, flagging 72.8% of malicious rows.

Layered Governance or Bust

This mismatch in scanner results throws a wrench into the idea of relying on single-scanner allow/block decisions. The data screams for a layered governance approach. Why only rely on one tool when multiple perspectives reveal a more nuanced picture?

The ClawHub dataset isn’t perfect. it's a sanitized, silver-standard collection, relying on automated registry verdicts rather than human eyes. Still, it's an early snapshot meant to aid the community. A human-annotated subset is on the way, hopefully bringing more clarity. But the real question is: With such discrepancies, how can we trust the security of AI without a multi-faceted approach?

Looking Ahead

The builders never left, and neither did the need for solid security measures in AI. As more data becomes available, further research will be important. We need models tailored to skill-security triage. After all, when it comes down to the safety of digital constructs, floor price is a distraction. Watch the utility. The meta shifted. Keep up.

Why AI Security Needs More Than Just One Scanner's Opinion

Disagreement Among Scanners

Layered Governance or Bust

Looking Ahead

Key Terms Explained