Anthropic Publishes Their (Partial) Whistleblowing Policy - First Thoughts And Context

Home » Anthropic Publishes Their (Partial) Whistleblowing Policy – First Thoughts And Context

NEWS

Anthropic Publishes Their (Partial) Whistleblowing Policy - First Thoughts And Context

Published 18th December 2025

As first reported by James Ball in Transformer, Anthropic has now published their “Responsible Scaling Policy (RSP) Noncompliance and Anti-Retaliation Policy” outlining how individuals working for Anthropic can report instances of suspected RSP noncompliance within the company. Please find below our first thoughts and context on why this matters.

Key Take-aways

Anthropic has published their “RSP Noncompliance and Anti-Retaliation Policy,” outlining how individuals working for Anthropic can report instances of suspected RSP noncompliance within the company.

In short:

What it covers: Suspected instances of RSP (Responsible Scaling Policy) noncompliance. Important: This is explicitly not a ‘complete’ whistleblowing policy—it doesn’t cover all violations of law or non-RSP-related misconduct, which Anthropic states are handled by other (not-yet-public) policies.
Who can use it: While the policy is “specifically written for Anthropic employees and Board members”, covered persons, i.e., those who are allowed to use the channel, seem to include research partners/eval organizations.
How it works: Anthropic uses NAVEX as a tool for anonymous reporting. Reports are directed to the Responsible Scaling Officer (“RSO”, currently Jared Kaplan). If reports relate to the behaviour of said RSO, reports are to be sent directly to the President of Anthropic, according to the policy.

Why This Matters:

Second major AI company to publish: Anthropic follows OpenAI (October 2024), but is the first to do so without regulatory or scandal-driven pressure.
First to commit to ongoing monitoring: Anthropic is the first frontier AI company to publicly commit to monitoring and reviews of their internal whistleblowing system. Publishing usage and outcome reports would make them the first AI company globally to achieve “Level 2 Whistleblowing Transparency.”
Moving from statements to action: Anthropic commits to “measure and verify compliance to this policy through various methods, including but not limited to ongoing monitoring, and both internal and external reviews.” Note: We do not have additional details on what is measured, how it is reviewed, etc. (See Level 2: Effectiveness Transparency), but this still sounds promising.

Items AIWI Will Aim to Clarify (before in-depth evaluation):

How this partial RSP policy relates to other internal policies covering non-RSP violations—”This is a bit confusing, and we aim to clarify this before we publish any in-depth commentary on this policy.”
The legal basis for the “protected activity” section, which states that Anthropic considers internal raising of concerns relating to RSP non-compliance legally protected from retaliation. “This statement can be rooted in various legal bases, and we aim to clarify this before we publish any in-depth commentary on this policy.”
The exact scope of coverage for research partners and the extended workforce. “Again, we will aim to clarify this before we publish any in-depth commentary on this policy.”

Context:

AIWI launched the Publish Your Policies campaign in July 2025 and was joined by 35+ signatories, including former AI company employees, legal experts, and academics. There is still work to be done: four out of six major AI companies currently do not publish their whistleblowing policies. Find out more at publishyourpolicies.org.

First Takes

The diagram below highlights the change to Anthropic’s Whistleblowing System Transparency level, as well as the current status of the other major frontier companies. Please note that the below only evaluates the transparency of the policy and outcome reporting—not the content or quality of the underlying system, protections, culture, or past patterns of retaliation.

We will be publishing an in-depth evaluation relating to the quality of Anthropic’s and OpenAI’s policies at a later date (this is your chance to subscribe if you don’t want to miss the in-depth evaluation).

We would like to highlight some interesting items that we aim to clarify before publishing a more in-depth review:

The policy is explicitly not a ‘complete’ whistleblowing policy, i.e., it doesn’t cover processes for how violations of the law or non-RSP-related misconduct are handled. Anthropic states that other policies and processes within the company cover such cases. These policies are not yet public. However, Anthropic’s tooling for submitting reports (NAVEX) also covers reports relating to non-RSP-related violations of the law, and the policy does talk about “protected activities”, i.e., reporting that is legally protected (which usually means reporting violations of a law). This is a bit confusing, and we aim to clarify this before we publish any in-depth commentary on this policy.
Anthropic uses NAVEX as a tool for anonymous reporting. Reports are directed to the Responsible Scaling Officer (“RSO”, currently Jared Kaplan). If reports relate to the behaviour of said RSO, according to the policy reports are to be sent directly to the President of Anthropic. We’d like to clarify who is responsible for doing so.
Anthropic commits to “measure and verify compliance to this policy through various methods, including but not limited to ongoing monitoring, and both internal and external reviews”. We do not have more details on what is measured, how it is reviewed, etc. (See Level 2: Effectiveness Transparency) — but this still sounds promising.
The “protected activity” section states that Anthropic considers internal raising of concerns relating to RSP non-compliance legally protected from retaliation. This is very interesting; most whistleblowing policies’ commitments going beyond the letter of the law are non-binding. This statement can be rooted in various legal bases, and we aim to clarify this before we publish any in-depth commentary on this policy.
Covered persons, i.e., those who are allowed to use the channel, seem to include research partners/eval organizations. We use “seems” here because the “Scope” section states that the policy is written “specifically” for employees and Board members, but that they “expect” members of the extended workforce to also report concerns. At the same time, Anthropic states in its answers to the FLI AI Safety Index that “AI research collaborators and academic partners” as well as “individuals assisting whistleblowers” were protected from retaliation under their policy. Again, we will aim to clarify this before we publish any in-depth commentary on this policy.

FLI published their AI Safety Index last week, which includes a section on whistleblowing policy transparency and quality. Find Anthropic’s answers to the questionnaire starting on page 101.

Why This Matters

AI Companies’ whistleblowing policies provide evidence of how their internal reporting channels operate. This is important for employees who may need to use them as well as for the public: Insiders may be the first to spot risks that concern us all. However, if details on companies’ whistleblowing systems aren’t published, we can’t verify if these channels are safe.

Currently, the majority of insiders reporting concerns do so internally first (75% of successful cases at the SEC, 2021). We expect this number to be equally high or higher in AI, where information asymmetries between regulators and companies can be stark.

At the same time, 95% of retaliation cases documented by the SEC accrue to individuals who first reported internally. This has led the National Whistleblower Center, a non-profit led by Stephen Kohn, an SEC whistleblowing lawyer, to recommend urgent caution to insiders who consider using the company’s internal whistleblowing channels.

We should hence all be highly interested in the extent to which these channels are safe for insiders and effective at detecting and rectifying issues. Evaluating whistleblowing policies is a first step in this direction.

Public feedback can help employees understand their whistleblowing systems (many of whom are not aware of their system even existing until it is too late) and lead to whistleblowing system improvements. This not only benefits employees and the public, but companies too: Evidence shows that well-structured internal reporting and speak-up systems reduce misconduct, enable early detection of risks, and can prevent small issues from escalating into major crises.

Better whistleblowing systems also lead to higher employee satisfaction, loyalty, and second-order effects like improved research results and innovation.

Step One: Publish, Step Two: Evaluate, Step Three: Demonstrate

While common in industries outside AI, we have seen few AI companies create transparency on their whistleblowing systems: Only OpenAI had previously published their policy, following its scandals in 2024. GDM, xAI, Meta, and Mistral have not published their policies or have only published them in fragments.

Anthropic is therefore taking a meaningful step as the policy’s publication allows us, the public, to provide feedback on the strength of their system’s protection.

Policy Publication for public feedback is Level 1 Transparency, and AIWI will be publishing an in-depth evaluation of both Anthropic’s and OpenAI’s policies based on Transparency International’s Internal Whistleblowing Systems Assessment Framework.

At the same time, policies are statements – not action, and therefore only one indicator of how good an internal system may be. A much stronger commitment to a ‘speak up’ culture is through consistent measurement and improvement: What we call Level 2 Transparency.

Companies should track their systems’ usage and outcomes over time using qualitative and quantitative indicators to ensure the system actually protects reporters and uncovers and corrects issues. Such indicators include the number of reports received, anonymity rates indicating trust levels, retaliation rates, and whistleblower satisfaction with the process through surveys or interviews. Companies should analyze the results and take action to improve their systems – just as they would with any other business process they care about.

We’re excited to review Anthropic’s policy and see if they plan to also publish, in a redacted form, their whistleblowing system’s usage and outcomes.

Anthropic has previously informally provided information on their whistleblowing system’s usage in their RSP review with METR. Committing to such reporting on a regular basis would be a strong signal that Anthropic is serious about their internal whistleblowing process (i.e., measuring to improve).

Sources: [2] Link, [3] Link, [4] Link, [5] Link, [6] Link, [7] Link, [8] Link, [9] Link, [10] Link

Context: AIWI’s Publish Your Policies Campaign

For the reasons outlined above, AIWI launched the Publish Your Policies campaign in July 2025 and was joined by 35+ signatories, including former AI company employees, legal experts, and academics. There is still work to be done: four out of six major AI companies currently do not publish their whistleblowing policies.

If you work at any of the 4 companies above – ask your leadership: Why not just publish your policies?

Find out more about the campaign at https://aiwi.org/publishyourpolicies/.

NEWS