← Back to docs

Argus · verification runtime for AI agents · live benchmark

The trust layer between agents and reality.

Every answer comes with the value, a confidence score, the verbatim quotes from each source we consulted, and any disagreements between them. Updated daily on a fixed gauntlet of stable, volatile, and edge-case facts. No other API does all four.

Why this is different

CapabilityArgusFlow FactsFirecrawl / JinaClearbit / Apollo
Returns the answer (not just HTML)yesno — you extractyes
Confidence score per answeryes (0–1)nono
Verbatim quote evidence per sourceyesnono
Source disagreements surfacedyesn/ahidden
Honest “unknown” on edge entitiesyesreturns junkguesses
Cost per verified factlive data pendinglive · from benchmark above$0.0015+(then you extract)$0.05–$0.20+(static enrichment)

Benchmark hasn't run yet. The cron will populate this page within 24 hours.

Methodology

  • Each (entity, attribute) resolved with refresh=true (cache bypassed) so latency reflects cold resolution.
  • Verified = ≥2 corroborating sources with composite confidence ≥0.80.
  • Partial = single high-confidence source (≥0.60).
  • Unknown = no source returned a value — the system says “I don't know” rather than guessing. Includes a deliberately fake entity (Glimmer Labs) to test honest-failure handling.
  • Cost includes all LLM extraction calls (Groq Llama 3.1 8B primary, Anthropic Haiku fallback) plus any source scrapes. No proxy / scraping infra costs since most resolutions hit free public sources.
  • Competitor pricing taken from public 2025-2026 documentation: Apollo from $0.05/credit, Clearbit Reveal $0.20/lookup, ZoomInfo $0.15+/record, Firecrawl $0.0015/scrape (then you extract yourself).
  • Backed by facts_benchmark_runs + facts_benchmark_summary tables. Public-readable; query directly via the Supabase REST API.