v1.0Effective: 2026-04-06 · Classification: Public
VerityHelm Methodology
VerityHelm provides adversarial verification of compliance attestations using publicly available signals. We produce findings, not opinions. This document discloses the complete methodology used to generate findings reports, enabling any party to understand, reproduce, or challenge our results.
VerityHelm does not perform audits, issue attestations, or provide legal opinions. We cross-reference vendor compliance claims against publicly observable signals and report factual contradictions, gaps, and observations.
1. Data Sources
VerityHelm v1.0 queries the following public signal sources. Each source is documented with its access method, limitations, and reliability assessment.
Tier 1 Sources (API-accessible, high reliability)
Source
What It Provides
Access Method
Freshness
SEC/EDGAR
Public company filings: 10-K risk factors, 8-K cybersecurity incidents, auditor changes
REST API (JSON), free, 10 req/sec
Same-day
GitHub Security Advisories
Open-source vulnerability database with CVEs and severity scores
REST API, CC BY 4.0 licensed
Near-real-time
Certificate Transparency (crt.sh)
All TLS/SSL certificates issued for a domain — subdomain enumeration, infrastructure mapping
Web/JSON/PostgreSQL, free
Near-real-time
USPTO Trademarks
Trademark registration status, filing history, related entities
SOC 3 reports, ISO 27001 certificates, public compliance claims
Web scrape of vendor websites
Varies (annual cycle)
Tier 2 Sources (requires subscription or careful access)
Source
What It Provides
Access Method
Freshness
NASBA CPAverify
CPA license verification across 55 U.S. jurisdictions
Web interface, individual lookups
Varies by state
AICPA Peer Review
Audit firm quality oversight enrollment and results
Web interface, individual lookups
1–3 year review cycles
HaveIBeenPwned
Domain breach exposure history
REST API, paid subscription for domain search
As breaches are verified
UKAS/ANAB Directories
ISO 27001 certifier accreditation status
Web search
As accreditations change
Court Records (PACER/RECAP)
Federal litigation history
PACER (paid, $0.10/page) and RECAP (free archive)
Same-day (PACER)
Tier 3 Sources (supplementary, used with caution)
Source
What It Provides
Access Method
Limitations
DNS/Subdomain History
Historical DNS records, infrastructure changes
SecurityTrails API (paid) or DNSdumpster
Limited free tier
State Corporation Filings
Entity registration, good standing, registered agent
Per-state web interfaces (fragmented)
No unified API; bot protection
Job Posting History
Security team maturity, technology stack signals
Wayback Machine CDX API (free)
Indirect signal; coverage gaps
Sources NOT Used
Paste sites (Pastebin, etc.): Corroborative only, never primary. We do not download or store credential content. If paste content contains PII, it is skipped entirely.
Social media (Twitter/X, LinkedIn posts): Not used as primary signals due to unreliability. May be used to corroborate findings from authoritative sources.
Confidential SOC 2 Type II reports: We do not access, request, or process confidential audit reports in v1.0. Engine inputs are limited to public signals (SOC 3 summaries, trust pages, public attestation claims).
2. Collection Method
2.1 Signal Collection
Each data source is queried using deterministic scripts (not AI/LLM agents). The collection process:
Claims are extracted using deterministic pattern matching:
Certification identification: Regex patterns match certifications (SOC 2, ISO 27001, HIPAA, GDPR, PCI DSS, FedRAMP, CSA STAR, CCPA, SOX)
Audit firm identification: Regex patterns match known audit firm names and common phrases ("audited by," "examined by," "certified by")
Security claims: Pattern matching extracts statements about encryption, monitoring, testing, and other security practices
No AI interpretation at this step: All extraction is regex-based. The patterns are versioned with this methodology document.
3.3 What We Do NOT Extract
We do not extract claims from confidential SOC 2 Type II reports
We do not extract claims from NDA-gated trust portals (we only access publicly visible content)
We do not infer claims that are not explicitly stated on vendor pages
4. Cross-Reference Logic
4.1 Signal-to-Claim Matching
Cross-referencing uses deterministic rules that compare public signals against extracted claims:
Claim Type
Signal Source
Cross-Reference Rule
"Zero security incidents"
GitHub Advisories, HaveIBeenPwned, SEC 8-K
If high/critical advisories or breach records exist during the claimed audit period, flag as contradiction
"Continuous monitoring"
CT Logs (subdomain count)
If subdomain count exceeds 50, flag as gap — question whether monitoring covers all infrastructure
Certification claims
AICPA Peer Review, CPAverify
Verify audit firm enrollment in peer review program; verify CPA signatory license status
ISO 27001 claims
UKAS/ANAB Directories
Verify certifying body is accredited
4.2 Rule Types
Contradiction: A public signal directly contradicts a vendor claim. Example: vendor claims zero incidents, but HaveIBeenPwned shows their domain in a breach database during the audit period.
Gap: A public signal raises a question that the vendor claim does not address. Example: 200 subdomains discovered but monitoring claims don't specify scope.
Observation: A public signal is notable but does not directly contradict or gap a specific claim. Example: SEC 8-K filings exist that may contain cybersecurity incident disclosures.
4.3 Temporal Matching
All cross-references are time-aware:
Signals are matched to the vendor's most recent audit period (if identifiable from SOC 3 or trust page)
If the audit period is not identifiable, signals from the most recent 12 months are used
The report notes when temporal alignment could not be verified
5. Contradiction Detection
5.1 What Constitutes a Contradiction
A finding is classified as a contradiction when ALL of the following are true:
The vendor makes a specific, verifiable claim (e.g., "zero security incidents in the audit period")
A public signal from an authoritative source directly conflicts with that claim (e.g., HIBP shows a breach record for the vendor's domain during the same period)
The conflict is unambiguous — there is no reasonable interpretation that reconciles both the claim and the signal
5.2 What Constitutes a Gap
A finding is classified as a gap when:
The vendor makes a broad claim (e.g., "continuous monitoring")
Public signals suggest the claim may be incomplete but do not directly contradict it (e.g., large infrastructure footprint that may exceed monitoring coverage)
5.3 What Does NOT Constitute a Finding
A vendor not having a trust page (absence of evidence is not evidence of absence)
A vendor using a compliance automation platform (this is standard practice)
A vendor's audit firm not being in our "known" list (we flag for investigation, not as a finding)
Public signals that are ambiguous or could have multiple interpretations
5.4 False Positive Expectations
VerityHelm v1.0 is calibrated for low false positive rate at the cost of higher false negative rate. We prefer to miss findings rather than report incorrect ones. Expected rates:
False positive rate: <5% of reported findings
False negative rate: ~40–60% of actual issues (many compliance issues are not detectable from public signals)
6. Known Limitations
6.1 Coverage Limitations
Private companies: Limited SEC/EDGAR data. Analysis primarily relies on trust pages, CT logs, GitHub, and court records.
Non-US companies: NASBA CPAverify, PACER, and state filings are US-only. International coverage requires different signal sources.
Small/early-stage companies: May have minimal public signal footprint. Analysis may return few or no findings.
Vendors without trust pages: If no public compliance claims are found, cross-referencing is not possible.
6.2 Methodology Limitations
No access to confidential reports: SOC 2 Type II reports are not used in v1.0. This means we cannot verify specific control descriptions or test procedures.
Deterministic pattern matching: Regex-based claim extraction may miss non-standard phrasings. Complex or nuanced claims may not be extracted.
Temporal alignment: Audit period dates are not always publicly available, limiting precision of temporal cross-referencing.
Auditor quality assessment is indirect: We can verify peer review enrollment and CPA license status, but we cannot assess the quality of the audit work itself from public signals.
6.3 Categories of Vendors Poorly Served
Private companies with minimal web presence
Companies operating primarily outside the US
Companies that do not publish any compliance information publicly
Infrastructure-level vendors (IaaS, PaaS) whose compliance posture is documented in separate compliance portals with different URL patterns
6.4 What the Methodology Cannot Detect
Fabricated evidence within confidential audit reports (requires access to the report)
Auditor capture or independence issues (requires insight into auditor-client relationship economics)
Internal compliance program effectiveness (requires internal access)
Social engineering susceptibility (requires active testing, which we do not perform)
Accuracy of specific technical controls (requires technical assessment, which we do not perform)
7. Version History
Version
Date
Changes
Backward Compatible
v1.0
2026-04-06
Initial release. 14 public signal sources. Deterministic pipeline. No scoring — findings only.
N/A (initial)
Planned for v1.1
Additional signal sources (paste-site corroborative signals, WHOIS history)
Improved temporal matching with audit period extraction from SOC 3 PDFs
Expanded audit firm database
Planned for v2.0 (post-legal review)
Optional Defensibility Score (0–100, weighted composite of findings)
SOC 2 Type II report metadata ingestion (control descriptions, audit firm, dates — not full report)
Continuous monitoring mode (weekly signal refresh)
8. Pipeline Architecture
INPUT: Vendor Name
│
├─ Step 1: Vendor Profile Assembly
│ └─ Queries: SEC/EDGAR, CT Logs, GitHub Advisories,
│ PCAOB, USPTO, Wayback Machine, CourtListener
│ └─ Output: 01-vendor-profile.json
│
├─ Step 2: Claim Extraction
│ └─ Scans: Trust pages, third-party trust centers,
│ SOC 3 download URLs
│ └─ Extracts: Certifications, audit firm, security claims
│ └─ Output: 02-claims.json
│
├─ Step 3: Cross-Reference Analysis
│ └─ Matches: Public signals against extracted claims
│ └─ Classifies: Contradiction / Gap / Observation
│ └─ Output: 03-cross-references.json
│
├─ Step 4: Fraud Pattern Match
│ └─ Checks: Auditor legitimacy, certification speed,
│ infrastructure scope, breach history
│ └─ Output: 04-pattern-matches.json
│
└─ Step 5: Report Generation
└─ Assembles: All findings into structured report
└─ Includes: Subject, methodology disclosure, findings,
│ questions, signal freshness, summary
└─ Output: findings-report.md
NOTE: Steps 1–4 are fully deterministic (scripts, regex, API queries).
Step 5 assembles structured data into report format.
LLM interpretation is available as an optional enhancement
in future versions.
This methodology document is versioned and published at verityhelm.com/methodology. Any changes result in a version increment documented in the Version History section. Findings reports reference the specific methodology version under which they were produced.