In brief
On 28 May 2026, at a G7 ministerial meeting in Paris, the OECD released version 2.0 of the Hiroshima AI Process (HAIP) reporting framework. HAIP is the G7's transparency tool for advanced AI: the only international framework in which AI developers set out, in a common format, how they manage the risks tied to their products. Crucially, its reports are public, unlike the disclosures required under the EU AI Act, which stay confidential.
V2 isn't a touch-up; it's an overhaul of the reporting process. The framework now covers the whole AI value chain, not just frontier model providers. It replaces most open-ended questions with structured, closed-ended ones whose answers can be compared across respondents. And it asks explicitly about the risks specific to frontier models: agentic AI, capability and propensity thresholds, systemic risks, and the role of AI Safety Institutes.
What remains open is the question of verification — and, where warranted, of challenging the answers.
What HAIP is, and why this update matters
Launched under Japan's G7 presidency in 2023 and built out under Italy's in 2024, HAIP saw a first version in February 2025: 25 reports from 19 organisations. They included the big American providers (Microsoft, Google, Anthropic, OpenAI) and several Japanese firms (NTT, NEC, Rakuten, KDDI, Fujitsu, SoftBank). Missing from the list were Meta and the major Chinese players, from DeepSeek to Tencent and Alibaba.
That first round got the hard part right: it brought major companies to publish their risk-management practices in one place, and in public. But one weakness remained, flagged by CDT and Brookings: the answers were too uneven to compare. V2 fixes this.
What V2 actually asks for
The OECD's headline is accessibility for smaller firms: the framework has been streamlined so that more of the value chain can take part, and more than fifty organisations have already committed to reporting under it.
V2 sorts respondents by role — model developer, application provider, deployer, and so on — and tailors the questions to each, whereas V1 implicitly assumed a frontier model developer was answering. It also swaps open questions for checkboxes and closed (yes/no) questions with conditional follow-ups; free text now survives only in the sub-questions.
But the real shift is elsewhere. In line with CeSIA's recommendations, V2 asks explicit questions about risks that V1 either buried in free text or passed over in silence:
- agentic AI, including control mechanisms and the monitoring of interactions between agents;
- how mitigations scale with a model's capabilities and propensities;
- the capabilities, behaviours and limitations linked to systemic risks, indicative thresholds included;
- whether or not third-party testing draws on the network of AI Safety Institutes (AISIs);
- categories absent from V1: fundamental rights, vulnerable groups, confidential reporting channels, and AI-specific threats (data poisoning, prompt injection).
V2 also borrows the vocabulary of the other major frameworks rather than inventing a parallel one: the NIST AI RMF, ISO/IEC 42001, the OECD AI Principles, the EU GPAI Code of Practice, and the International AI Safety Report. This lets frontier developers show, in a comparable format, how they are living up to commitments they have already made — starting with the Seoul Commitments.
As co-conveners of the Global Call for AI Red Lines, what stands out to us most is the logic of thresholds, present both in the new language of propensity and in the transparency around systemic risks. It puts in black and white the idea that a risk becomes unacceptable beyond a certain level if it isn't adequately mitigated — exactly what any red-lines regime will one day have to formalise.
Silence becomes a signal
V1's nineteen reporters were a self-selected club of frontier companies: not being among them said little. V2 brings together deployers, application and compute providers, and SMEs, all answering the same questions. A major application provider that doesn't report is no longer invisible — its direct competitors will have published.
A useful role in a crowded landscape
Voluntary disclosures are proliferating: every major company publishes its own safety framework, the EU General-Purpose AI Code of Practice drew twenty-six signatories last July, and the Seoul Commitments bind sixteen developers. Amid all this, HAIP plays a distinct role.
The AI Act's Code of Practice is the most demanding: signing it earns a presumption of conformity with the regulation, while not signing raises the burden of proof. But its reach stops at the EU's borders, and it is confidential by default. Companies' own voluntary safety frameworks (Anthropic's Responsible Scaling Policy, OpenAI's Preparedness Framework, Google DeepMind's Frontier Safety Framework, and others) are unilateral: each sets its own thresholds and develops its own methods.
HAIP is the only voluntary instrument that works across jurisdictions and offers a common public reporting structure. It doesn't replace the existing processes; it adds the layer of comparability and public accountability that none of them can provide on its own. And because it is international, it reaches players beyond the EU Code's reach: compute providers, deployers that operate outside the EU, and companies that haven't signed the AI Act's Code of Practice.
What needs to happen next
Every organisation across the value chain is invited to file a report: submissions are accepted on a rolling basis, and those received before 1 September 2026 will feed into the next round of analysis. Governments, for their part, would do well to treat HAIP as the cross-border layer of comparability it has become, and to anchor their national strategies to it rather than each reinventing its own template.
For the AI safety and policy community, the open question is verification. We've argued elsewhere that the issue is no longer whether it's needed, but how fast we can put it in place. A tension remains: V1's free text captured a nuance that checkboxes won't. OpenAI's reports, for instance, spelled out its Preparedness Framework and named research partners that no checkbox in V2 would surface. Is the comparability gained across more reporters worth the depth lost from a smaller group? The next reporting cycle will tell.
