Scoring Framework

How We
Score Safety

40 publicly assessable indicators mapped across 6 major safety and governance frameworks. Every score is backed by verifiable evidence — no surveys, no self-reporting.

Indicators
40
per company
Dimensions
5
weighted
Frameworks
6
cross-mapped
Scale
0-5
ordinal per indicator
Core Approach

Independent, evidence-based assessment

Most AI governance assessments rely on company cooperation — questionnaires, self-reported data, voluntary disclosures. This introduces bias and limits coverage to companies willing to participate.

Mappera takes a different approach. Every indicator is scored using publicly available information: published policies, GitHub repositories, benchmark results, regulatory filings, job listings, and academic publications. Any company deploying AI in high-risk sectors — healthcare, hiring, finance, autonomous systems — can be assessed whether they cooperate or not.

The methodology synthesizes requirements from six recognized frameworks into a single, comparable score. Each of the 40 indicators maps to at least one framework, ensuring the rubric reflects established governance standards rather than arbitrary criteria. The rubric is calibrated so that a Series A startup with basic governance practices scores 2-3 on most indicators, not zero — enabling meaningful differentiation across the startup and scale-up landscape.

Five Dimensions

40 indicators organized into 5 dimensions with differentiated weights. Governance and Technical Safety carry the most weight (25% each) as direct measures of a company's safety posture.

GMGovernance Maturity
25%
  • Published AI safety / responsible AI policy (GM-01)
  • Dedicated safety/ethics role or team (GM-02)
  • Board/leadership AI risk oversight (GM-03)
  • AI incident response process (GM-04)
  • Pre-deployment review gate (GM-05)
  • Risk appetite / acceptable use policy (GM-06)
  • Safety culture signals (GM-07)
  • AI system inventory and lifecycle management (GM-08)
TSTechnical Safety
25%
  • Pre-deployment testing and red teaming (TS-01)
  • Safety benchmarks and evaluations (TS-02)
  • Bias and fairness testing (TS-03)
  • Output monitoring and guardrails (TS-04)
  • Robustness and adversarial resilience (TS-05)
  • Human oversight mechanisms (TS-06)
  • Model/system documentation (TS-07)
  • Validation in deployment context (TS-08)
RARisk Assessment
20%
  • Sector-specific risk assessment (RA-01)
  • AI system impact assessment (RA-02)
  • Dual-use / misuse risk awareness (RA-03)
  • Societal and downstream impact (RA-04)
  • Data quality and bias risk management (RA-05)
  • Risk taxonomy / classification (RA-06)
  • Incident history and track record (RA-07)
  • Third-party and supply chain risk (RA-08)
RRRegulatory Readiness
20%
  • EU AI Act risk classification awareness (RR-01)
  • Regulatory-grade documentation (RR-02)
  • Conformity / third-party assessment (RR-03)
  • Data governance framework (RR-04)
  • Sector-specific regulatory compliance (RR-05)
  • Voluntary safety commitments (RR-06)
  • Standards body participation (RR-07)
  • Proactive regulatory engagement (RR-08)
EEExternal Engagement
10%
  • Safety / responsible AI publications (EE-01)
  • External safety audits and evaluations (EE-02)
  • Transparency reporting (EE-03)
  • Vulnerability / issue disclosure program (EE-04)
  • Multi-stakeholder engagement (EE-05)
  • Open-source safety contributions (EE-06)
  • Academic / research collaboration (EE-07)
  • Industry safety cooperation (EE-08)

Indicator Weighting

Within each dimension, all indicators are equally weighted. A GM indicator contributes 3.125% to the overall score while an EE indicator contributes 1.25%.

Dim
Ind.
Weight
Per Ind.
Formula
GM825%3.125%25% / 8 = 3.125%
TS825%3.125%25% / 8 = 3.125%
RA820%2.500%20% / 8 = 2.500%
RR820%2.500%20% / 8 = 2.500%
EE810%1.250%10% / 8 = 1.250%

Overall = (GM avg x 0.25) + (TS avg x 0.25) + (RA avg x 0.20) + (RR avg x 0.20) + (EE avg x 0.10). Dimension averages are computed from the normalized 0-5 scores of all assessed indicators.

Framework Crosswalk

Every indicator traces to at least one recognized standard. Counts reflect how many of the 40 indicators reference each framework.

Framework
Mapped
Focus
NIST AI RMF34Risk management lifecycle: Govern, Map, Measure, Manage functions
EU AI Act33Risk classification, Annex III high-risk categories, Annex IV documentation, GPAI obligations
ISO 4200128Certifiable AI management system: lifecycle controls, risk assessment, governance structure
FLI AI Safety Index22Company-level safety practices, governance commitments, public accountability
MLCommons AILuminate12Standardized safety benchmarks: HELM Safety, BBQ bias, jailbreak resistance
METR8Capability elicitation, uplift studies, controlled evaluation experiments

Totals exceed 40 because individual indicators may map to multiple frameworks. For example, TS-03 (Bias and Fairness Testing) maps to NIST AI RMF, EU AI Act, MLCommons AILuminate, and ISO 42001.

Indicator Scoring Scale

All 40 indicators use a 0-5 ordinal scale with defined thresholds. Each level has specific, observable criteria documented in the rubric.

0Not visible

No public evidence of the practice. Absent from website, docs, filings.

1Minimal

Vague mention in passing. Acknowledged but no substance — e.g. safety in a job posting, a one-line ToS clause.

2Basic

Some effort visible but informal or incomplete. A brief responsible AI statement, basic bias checks, or informal incident handling.

3Structured

Documented and deliberate. A dedicated policy page, structured testing program, formal review gate, or documented risk assessment process. This is where a well-run Series A startup should aim.

4Comprehensive

Mature and thorough. Multiple safeguards, external validation, published results, dedicated teams with clear mandate. Typical of well-resourced companies taking safety seriously.

5Industry-leading

Best in class with external verification. Published frameworks with version control and enforcement, third-party audits, transparent reporting with quantified metrics, and demonstrated follow-through.

Grading Scale

Dimension and overall scores (0-100) map to letter grades. Thresholds are field-adjusted to reflect the current state of AI safety maturity, where even industry leaders score 70-75 on absolute scales.

Grade
Range
GPA
A+85-1004.3
A75-844.0
A-68-743.7
B+62-673.3
B55-613.0
B-48-542.7
C+44-472.3
C40-432.0
C-35-391.7
D+30-341.3
D25-291.0
D-20-240.7
F0-190.0

Sector Coverage

Mappera assesses AI companies across six sector categories, each carrying distinct regulatory and safety requirements.

Healthcare AI

Clinical note generation, diagnostic imaging, decision support. Key regulations: FDA, HIPAA, MDR. Key indicators: TS-06 (human oversight), TS-08 (clinical validation), RR-05 (sector compliance).

Hiring AI

Candidate screening, video assessment, talent matching. Key regulations: EEOC, NYC Local Law 144, EU AI Act Annex III. Key indicators: TS-03 (bias testing), TS-06 (human oversight), RA-07 (incident history).

Financial AI

Credit decisioning, fraud detection, algorithmic trading. Key regulations: ECOA, FCRA, SOC 2, PCI DSS. Key indicators: TS-03 (fairness), RA-05 (data quality), RR-05 (sector compliance).

Autonomous Systems

Self-driving, delivery robots, drones, humanoid robotics. Key regulations: automotive safety standards, aviation regs. Key indicators: RA-01 (sector risk), TS-08 (field validation), TS-05 (robustness).

Frontier Labs

Foundation model developers. Key regulations: EU AI Act GPAI obligations. Assessed across the full rubric with particular focus on GM, TS, and EE dimensions.

AI Safety Tooling

Evaluation platforms, governance tools, monitoring infrastructure. Interesting to safety-focused VCs and grantmakers. Key indicators: TS-01 (testing), TS-02 (benchmarks), EE-06 (open-source).

How to Improve Your Score

If your company appears in the Mappera directory and you want to improve your score, the highest-impact actions by dimension:

GMGovernance Maturity(25%)

Publish a versioned, board-approved AI safety policy. Appoint a named safety/ethics lead or team. Document a pre-deployment review gate with go/no-go criteria. Establish an AI-specific incident response plan.

TSTechnical Safety(25%)

Run and publish bias and fairness testing results. Implement human-in-the-loop with documented appeal processes. Publish model/system cards. Validate in deployment context with published results (clinical studies, field trials, pilot data).

RARisk Assessment(20%)

Conduct and publish sector-specific risk assessments. Document third-party AI/model supply chain risks. Build a structured risk taxonomy. Assess societal and downstream impacts on vulnerable populations.

RRRegulatory Readiness(20%)

Achieve sector-specific compliance (FDA clearance, SOC 2, HIPAA, EEOC). Self-classify under EU AI Act risk tiers. Pursue ISO 42001 or third-party assessment. Publish regulatory-grade technical documentation.

EEExternal Engagement(10%)

Publish safety research or case studies. Accept external safety audits. Establish a vulnerability disclosure program. Contribute safety tools or datasets as open source. Engage with affected communities.

Limitations

Confidence Levels and Constraints

Each dimension carries a confidence level (high, medium, low) based on the percentage of indicators with verified evidence. High confidence requires evidence for 80% or more indicators. Medium covers 50-79%. Low is below 50%.

Public assessability constrains depth. Internal practices, board minutes, incident response logs, and unreleased evaluation results are invisible to this methodology. A low score may reflect poor transparency rather than poor safety practices. This is a feature, not a bug — transparency itself is a safety practice.

v0.3 introduced differentiated dimension weights (25/25/20/20/10) replacing equal weighting, and field-adjusted grading thresholds calibrated to the actual state of AI safety maturity. Future versions may further adjust weights based on empirical validation against expert assessments and observed incidents.

See the scores in action

Explore company scores, compare side-by-side, or review the field-wide funding landscape.

Mappera Scoring Methodology v0.3 — 40 indicators across 5 dimensions mapped to 6 frameworks.
Calibrated for startups, scale-ups, and deployers in high-risk sectors.