How We
Score Safety
40 publicly assessable indicators mapped across 6 major safety and governance frameworks. Every score is backed by verifiable evidence — no surveys, no self-reporting.
Independent, evidence-based assessment
Most AI governance assessments rely on company cooperation — questionnaires, self-reported data, voluntary disclosures. This introduces bias and limits coverage to companies willing to participate.
Mappera takes a different approach. Every indicator is scored using publicly available information: published policies, GitHub repositories, benchmark results, regulatory filings, job listings, and academic publications. Any company deploying AI in high-risk sectors — healthcare, hiring, finance, autonomous systems — can be assessed whether they cooperate or not.
The methodology synthesizes requirements from six recognized frameworks into a single, comparable score. Each of the 40 indicators maps to at least one framework, ensuring the rubric reflects established governance standards rather than arbitrary criteria. The rubric is calibrated so that a Series A startup with basic governance practices scores 2-3 on most indicators, not zero — enabling meaningful differentiation across the startup and scale-up landscape.
Five Dimensions
40 indicators organized into 5 dimensions with differentiated weights. Governance and Technical Safety carry the most weight (25% each) as direct measures of a company's safety posture.
- Published AI safety / responsible AI policy (GM-01)
- Dedicated safety/ethics role or team (GM-02)
- Board/leadership AI risk oversight (GM-03)
- AI incident response process (GM-04)
- Pre-deployment review gate (GM-05)
- Risk appetite / acceptable use policy (GM-06)
- Safety culture signals (GM-07)
- AI system inventory and lifecycle management (GM-08)
- Pre-deployment testing and red teaming (TS-01)
- Safety benchmarks and evaluations (TS-02)
- Bias and fairness testing (TS-03)
- Output monitoring and guardrails (TS-04)
- Robustness and adversarial resilience (TS-05)
- Human oversight mechanisms (TS-06)
- Model/system documentation (TS-07)
- Validation in deployment context (TS-08)
- Sector-specific risk assessment (RA-01)
- AI system impact assessment (RA-02)
- Dual-use / misuse risk awareness (RA-03)
- Societal and downstream impact (RA-04)
- Data quality and bias risk management (RA-05)
- Risk taxonomy / classification (RA-06)
- Incident history and track record (RA-07)
- Third-party and supply chain risk (RA-08)
- EU AI Act risk classification awareness (RR-01)
- Regulatory-grade documentation (RR-02)
- Conformity / third-party assessment (RR-03)
- Data governance framework (RR-04)
- Sector-specific regulatory compliance (RR-05)
- Voluntary safety commitments (RR-06)
- Standards body participation (RR-07)
- Proactive regulatory engagement (RR-08)
- Safety / responsible AI publications (EE-01)
- External safety audits and evaluations (EE-02)
- Transparency reporting (EE-03)
- Vulnerability / issue disclosure program (EE-04)
- Multi-stakeholder engagement (EE-05)
- Open-source safety contributions (EE-06)
- Academic / research collaboration (EE-07)
- Industry safety cooperation (EE-08)
Indicator Weighting
Within each dimension, all indicators are equally weighted. A GM indicator contributes 3.125% to the overall score while an EE indicator contributes 1.25%.
Overall = (GM avg x 0.25) + (TS avg x 0.25) + (RA avg x 0.20) + (RR avg x 0.20) + (EE avg x 0.10). Dimension averages are computed from the normalized 0-5 scores of all assessed indicators.
Framework Crosswalk
Every indicator traces to at least one recognized standard. Counts reflect how many of the 40 indicators reference each framework.
Totals exceed 40 because individual indicators may map to multiple frameworks. For example, TS-03 (Bias and Fairness Testing) maps to NIST AI RMF, EU AI Act, MLCommons AILuminate, and ISO 42001.
Indicator Scoring Scale
All 40 indicators use a 0-5 ordinal scale with defined thresholds. Each level has specific, observable criteria documented in the rubric.
No public evidence of the practice. Absent from website, docs, filings.
Vague mention in passing. Acknowledged but no substance — e.g. safety in a job posting, a one-line ToS clause.
Some effort visible but informal or incomplete. A brief responsible AI statement, basic bias checks, or informal incident handling.
Documented and deliberate. A dedicated policy page, structured testing program, formal review gate, or documented risk assessment process. This is where a well-run Series A startup should aim.
Mature and thorough. Multiple safeguards, external validation, published results, dedicated teams with clear mandate. Typical of well-resourced companies taking safety seriously.
Best in class with external verification. Published frameworks with version control and enforcement, third-party audits, transparent reporting with quantified metrics, and demonstrated follow-through.
Grading Scale
Dimension and overall scores (0-100) map to letter grades. Thresholds are field-adjusted to reflect the current state of AI safety maturity, where even industry leaders score 70-75 on absolute scales.
Sector Coverage
Mappera assesses AI companies across six sector categories, each carrying distinct regulatory and safety requirements.
Clinical note generation, diagnostic imaging, decision support. Key regulations: FDA, HIPAA, MDR. Key indicators: TS-06 (human oversight), TS-08 (clinical validation), RR-05 (sector compliance).
Candidate screening, video assessment, talent matching. Key regulations: EEOC, NYC Local Law 144, EU AI Act Annex III. Key indicators: TS-03 (bias testing), TS-06 (human oversight), RA-07 (incident history).
Credit decisioning, fraud detection, algorithmic trading. Key regulations: ECOA, FCRA, SOC 2, PCI DSS. Key indicators: TS-03 (fairness), RA-05 (data quality), RR-05 (sector compliance).
Self-driving, delivery robots, drones, humanoid robotics. Key regulations: automotive safety standards, aviation regs. Key indicators: RA-01 (sector risk), TS-08 (field validation), TS-05 (robustness).
Foundation model developers. Key regulations: EU AI Act GPAI obligations. Assessed across the full rubric with particular focus on GM, TS, and EE dimensions.
Evaluation platforms, governance tools, monitoring infrastructure. Interesting to safety-focused VCs and grantmakers. Key indicators: TS-01 (testing), TS-02 (benchmarks), EE-06 (open-source).
How to Improve Your Score
If your company appears in the Mappera directory and you want to improve your score, the highest-impact actions by dimension:
Publish a versioned, board-approved AI safety policy. Appoint a named safety/ethics lead or team. Document a pre-deployment review gate with go/no-go criteria. Establish an AI-specific incident response plan.
Run and publish bias and fairness testing results. Implement human-in-the-loop with documented appeal processes. Publish model/system cards. Validate in deployment context with published results (clinical studies, field trials, pilot data).
Conduct and publish sector-specific risk assessments. Document third-party AI/model supply chain risks. Build a structured risk taxonomy. Assess societal and downstream impacts on vulnerable populations.
Achieve sector-specific compliance (FDA clearance, SOC 2, HIPAA, EEOC). Self-classify under EU AI Act risk tiers. Pursue ISO 42001 or third-party assessment. Publish regulatory-grade technical documentation.
Publish safety research or case studies. Accept external safety audits. Establish a vulnerability disclosure program. Contribute safety tools or datasets as open source. Engage with affected communities.
Confidence Levels and Constraints
Each dimension carries a confidence level (high, medium, low) based on the percentage of indicators with verified evidence. High confidence requires evidence for 80% or more indicators. Medium covers 50-79%. Low is below 50%.
Public assessability constrains depth. Internal practices, board minutes, incident response logs, and unreleased evaluation results are invisible to this methodology. A low score may reflect poor transparency rather than poor safety practices. This is a feature, not a bug — transparency itself is a safety practice.
v0.3 introduced differentiated dimension weights (25/25/20/20/10) replacing equal weighting, and field-adjusted grading thresholds calibrated to the actual state of AI safety maturity. Future versions may further adjust weights based on empirical validation against expert assessments and observed incidents.
See the scores in action
Explore company scores, compare side-by-side, or review the field-wide funding landscape.
Mappera Scoring Methodology v0.3 — 40 indicators across 5 dimensions mapped to 6 frameworks.
Calibrated for startups, scale-ups, and deployers in high-risk sectors.