
The Center for AI Standards and Innovation (CAISI) announced voluntary agreements this week with Google DeepMind, Microsoft and xAI to permit U.S. government safety evaluations of their frontier AI models both before and after public release. The deals extend federal scrutiny to major commercial labs and follow Anthropic’s decision to delay the release of its Claude Mythos model out of concern that the model’s advanced cybersecurity capabilities could be misused. The commitments matter because they could change how builders handle pre-release access and post-release testing, affecting developers, national security reviewers and the public alike.
CAISI, which recently rebranded from the U.S. AI Safety Institute and dropped “safety” from its name, said the new pacts “build on” earlier policy and will help the center scale collaborative work “in the public interest,” a point emphasized by CAISI Director Chris Fall. The agency reported it has completed roughly 40 evaluations to date, including assessments of frontier models that have not been publicly released. CAISI said those tests often involve models with safeguards reduced or removed so evaluators can probe capabilities and risks relevant to national security more thoroughly.
The announcements arrived amid signals of possible executive action. White House National Economic Council Director Kevin Hassett told Fortune that President Trump may soon issue an executive order requiring government testing of advanced AI systems prior to release. CAISI also said an interagency task force of experts has been formed to concentrate on AI national security concerns, though the center has not published comprehensive testing standards or formal guidance on how evaluations will be conducted.
Industry responses were mixed. Tom Lue, Google DeepMind’s vice president of frontier AI global affairs, said on LinkedIn he was “pleased” with CAISI’s plans, and Microsoft’s corporate blog framed national — security and large — scale public — safety testing as necessarily collaborative with government. xAI did not immediately respond to requests for comment. At the same time, critics noted CAISI has not specified testing standards and questioned whether the center has adequate funding and expertise to carry out a sustained evaluation program.
Observers say the combination of voluntary agreements, the prospect of an executive order, and the absence of published standards creates practical uncertainty for AI builders and labs. Potential implications cited include obligations to give government reviewers pre-release access to models and to run mandated post-release evaluations. Critics warn that unclear procedures or politicized assessments could erode trust and discourage companies from cooperating, while proponents argue that deeper collaboration is needed to identify and mitigate serious risks.
Stakeholders are likely to watch CAISI’s forthcoming disclosures and any formal guidance closely, since those documents will determine how broadly the agency’s evaluations reach, what criteria are applied, and how industry and government balance transparency, safety and commercial concerns.
Sources
Replies (0)
No replies in this topic yet.