METR

NonprofitPreliminary

Model Evaluation and Threat Research - conducts frontier model capability evaluations to assess whether AI systems pose catastrophic risks. Works with governments and frontier labs to test dangerous capabilities.

HQUS
Est2023
metr.org
Score
57.0 / 100
Confidence
Preliminary

Strong safety posture with established governance frameworks and active risk management.

Strengths:Governance Maturity, Technical Safety, Risk Assessment, External Engagement
Weaknesses:Regulatory Readiness
Competitive positioning

The gold standard for frontier model capability evaluation. No commercial competitor operates at this level. Unique position as trusted independent evaluator.

Key risk

Nonprofit model. Evaluation contracts are project-based, not recurring SaaS revenue. Depends on continued government and lab willingness to submit to evaluation.

Enterprise traction

Frontier lab and government evaluation contracts. Not a commercial product.

governmentfrontier labs
Safety area

Evaluations & Benchmarking

Enterprise business needs
Test my AI before deployment

Security Assessment

Security-relevant indicators for vendor evaluation

Security Posture
63
TS-01dim: 65
Red Teaming & Pre-deployment Testing
Adversarial testing before deployment
TS-05dim: 65
Robustness & Adversarial Resilience
Resistance to adversarial attacks
RA-01dim: 60
Sector-Specific Risk Assessment
Risk analysis for deployment context
RA-03dim: 60
Dual-Use & Misuse Risk
Dangerous capability awareness
RA-07dim: 60
Incident History & Track Record
Past incidents and response quality
EE-04dim: 72
Vulnerability Disclosure Program
Bug bounty or CVE reporting process
Incident History
METR incident records sourced from AIAAIC Repository and public reporting.
Integration: AIAAIC, OECD AI Incidents Monitor
Third-Party Audits
External audit reports, SOC 2 attestations, and ISO certifications verified where published.
Sources: Company filings, registry lookups
CVE & Disclosures
Known vulnerabilities and security advisories from NVD, GitHub Security Advisories, and vendor pages.
Sources: NVD, GHSA, vendor disclosure pages

Dimension Breakdown

GM
Governance Maturitypreliminary
Published policies, corporate structure, safety mandate, whistleblowing, executive commitment.
58
TS
Technical Safetypreliminary
Benchmarks, adversarial robustness, fine-tuning safety, watermarking, model cards, research output.
65
RA
Risk Assessmentpreliminary
Dangerous capability evaluations, thresholds, external testing, bug bounty, halt conditions.
60
RR
Regulatory Readinesspreliminary
ISO 42001, EU AI Act compliance, GPAI obligations, international commitments, incident reporting.
30
EE
External Engagementpreliminary
Survey participation, research support, transparency, behavior specs, open-source contributions.
72

Social Impact & Safety Profile

Moderate

METR conducts frontier model capability evaluations for governments and AI labs, assessing whether systems pose catastrophic risks. Their independent evaluator role gives them unique access to test dangerous capabilities before deployment. Government and lab partnerships (OpenAI, Anthropic) validate their methodology.

capability evaluationcatastrophic risk assessmentfrontier model testing

Want METR scored on the Mappera framework?

Subscribe to get notified when full safety scoring becomes available, or reach out to request a detailed brief.