Pentagon seeks system to ensure AI models work as planned

As DOD increasingly relies on artificial intelligence, a question has arisen: How can one be sure that the AI models are working the way they should?

Defense News
75
3 min čtení
0 zobrazení
Pentagon seeks system to ensure AI models work as planned

MilTech

By Michael Peck

 Mar 12, 2026, 01:05 AM

U.S. Marines train on Marine Corps Base Camp Pendleton, California, Aug. 20, 2025. (Sgt. Trent Henry/U.S. Marine Corps)

As the Pentagon increasingly relies on artificial intelligence, a question has arisen: How can one be sure that the AI models are working the way they should?

The best way is to test new AI before users get their hands on it. So, the Defense Department — along with the Office of the Director of National Intelligence — is seeking a system that can test whether AI models meet specified criteria.

“As artificial intelligence (AI) capabilities evolve at an extraordinary pace, the government requires evaluation infrastructure that can keep pace by continuously assessing new models against mission-specific benchmarks as they are released,” according to an Area of Interest announcement from the Defense Innovation Unit.

DOD also wants to ensure that AI and humans work well together. “Evaluation must assess not only whether AI systems can perform tasks in isolation, but whether human-AI teams achieve better mission outcomes than either humans or AI alone,” the announcement said.

DIU envisions a “harness” with a standard, pluggable architecture that can test any AI — developed by any contractor — and provide a consistent, structured evaluation. This includes studying workflows across different environments, safely auditing AI agents and allowing human experts to assess “human workload, usability, and mission performance across human-only, AI-only, and human-AI team scenarios.”

The harness should also test whether the AI can function amid chaotic, low-information conditions. The system must simulate “operational stress and network degradation in a controlled, reproducible environment,” DIU said.

Also evaluated will be whether enemy AI can hijack or confuse friendly AI models. The system must support “automated red-teaming, including the execution of adversarial prompts and attack patterns.”

AI will be assessed against a variety of benchmarks. They include “identifying what capabilities matter for a given mission context” and breaking down complex AI capabilities into smaller, measurable tasks. Results should be clear, including establishing what constitutes a good score for an AI, and delivered in a format that is “easily understood and can be acted upon by decision makers.”

DIU was also careful to note that the evaluation system must be fair, with “no systemic advantage to particular architectures or vendors.”

The deadline is March 24.

About Michael Peck

Michael Peck is a correspondent for Defense News and a columnist for the Center for European Policy Analysis. He holds an M.A. in political science from Rutgers University. Find him on X at @Mipeck1. His email is mikedefense1@gmail.com.

Původní zdroj

Defense News

Sdílet tento článek

Související články

🛡️
🛡️NATO a aliance
Breaking Defense

KC-135 tanker involved in Epic Fury goes down in Iraq: CENTCOM

According to CENTCOM, the incident was “not due to hostile or friendly fire” and occurred during Operation Epic Fury.

před 1 dnem1 min
US Air Force KC-135 goes down in Iraq, CENTCOM says
🛡️NATO a aliance
Defense News

US Air Force KC-135 goes down in Iraq, CENTCOM says

“The incident was not due to hostile fire or friendly fire,” the release said, adding that rescue efforts are ongoing.

před 1 dnem2 min
US Air Force KC-135 goes down in Iraq, CENTCOM says
🛡️NATO a aliance
Military Times

US Air Force KC-135 goes down in Iraq, CENTCOM says

“The incident was not due to hostile fire or friendly fire,” the release said, adding that rescue efforts are ongoing.

před 1 dnem2 min
🛡️
🛡️NATO a aliance
Breaking Defense

EXCLUSIVE: Freeman out as head of Amazon Leo Government

Ricky Freeman joined the company in 2023 as part of an increased focus in the defense market.

před 1 dnem1 min