(Reuters)

When an AI script written by a Department of Government Efficiency employee came across a contract for internet service, it flagged it as cancelable. Not because it was waste, fraud or abuse — the Department of Veterans Affairs needs internet connectivity after all — but because the model was given unclear and conflicting instructions.

Sahil Lavingia, who wrote the code, told it to cancel, or in his words “munch,” anything that wasn’t “directly supporting patient care.” Unfortunately, neither Lavingia nor the model had the knowledge required to make such determinations.

“I think that mistakes were made,” said Lavingia, who worked at DOGE for nearly two months, in an interview with ProPublica. “I’m sure mistakes were made. Mistakes are always made.”

It turns out, a lot of mistakes were made as DOGE and the VA rushed to implement President Donald Trump’s February executive order mandating all of the VA’s contracts be reviewed within 30 days.

ProPublica obtained the code and prompts — the instructions given to the AI model — used to review the contracts and interviewed Lavingia and experts in both AI and government procurement. We are publishing an analysis of those prompts to help the public understand how this technology is being deployed in the federal government.

The experts found numerous and troubling flaws: the code relied on older, general-purpose models not suited for the task; the model hallucinated contract amounts, deciding around 1,100 of the agreements were each worth $34 million when they were sometimes worth thousands; and the AI did not analyze the entire text of contracts. Most experts said that, in addition to the technical issues, using off-the-shelf AI models for the task — with little context on how the VA works — should have been a nonstarter.

Lavingia, a software engineer enlisted by DOGE, acknowledged there were flaws in what he created and blamed, in part, a lack of time and proper tools. He also stressed that he knew his list of what he called “MUNCHABLE” contracts would be vetted by others before a final decision was made.

Portions of the prompt are pasted below along with commentary from experts we interviewed. Lavingia published a complete version of it on his personal GitHub account.

Problems with how the model was constructed can be detected from the very opening lines of code, where the DOGE employee instructs the model how to behave:

This part of the prompt, known as a system prompt, is intended to shape the overall behavior of the large language model, or LLM, the technology behind AI bots like ChatGPT. In this case, it was used before both steps of the process: first, before Lavingia used it to obtain information like contract amounts; then, before determining if a contract should be canceled.

Including information not related to the task at hand can confuse AI. At this point, it’s only being asked to gather information from the text of the contract. Everything related to “munchable status,” “soft-services” or “DEI” is irrelevant. Experts told ProPublica that trying to fix issues by adding more instructions can actually have the opposite effect — especially if they’re irrelevant.

The models were only shown the first 10,000 characters from each document, or approximately 2,500 words. Experts were confused by this, noting that OpenAI models support inputs over 50 times that size. Lavingia said that he had to use an older AI model that the VA had already signed a contract for.

This portion of the prompt instructs the AI to extract the contract number and other key details of a contract, such as the “total contract value.”

This was error-prone and not necessary, as accurate contract information can already be found in publicly available databases like USASpending. In some cases, this led to the AI system being given an outdated version of a contract, which led to it reporting a misleadingly large contract amount. In other cases, the model mistakenly pulled an irrelevant number from the page instead of the contract value.

“They are looking for information where it’s easy to get, rather than where it’s correct,” said Waldo Jaquith, a former Obama appointee who oversaw IT contracting at the Treasury Department. “This is the lazy approach to gathering the information that they want. It’s faster, but it’s less accurate.”

Lavingia acknowledged that this approach led to errors but said that those errors were later corrected by VA staff.

Once the program extracted this information, it ran a second pass to determine if the contract was “munchable.”

Again, only the first 10,000 characters were shown to the model. As a result, the munchable determination was based purely on the first few pages of the contract document.

The above prompt section is the first set of instructions telling the AI how to flag contracts. The prompt provides little explanation of what it’s looking for, failing to define what qualifies as “core medical/benefits” and lacking information about what a “necessary consultant” is.

For the types of models the DOGE analysis used, including all the necessary information to make an accurate determination is critical.

Cary Coglianese, a University of Pennsylvania professor who studies the governmental use of artificial intelligence, said that knowing which jobs could be done in-house “calls for a very sophisticated understanding of medical care, of institutional management, of availability of human resources” that the model does not have.

The prompt above tries to implement a fundamental policy of the Trump administration: killing all DEI programs. But the prompt fails to include a definition of what DEI is, leaving the model to decide.

Despite the instruction to cancel DEI-related contracts, very few were flagged for this reason. Procurement experts noted that it’s very unlikely for information like this to be found in the first few pages of a contract.

These two lines — which experts say were poorly defined — carried the most weight in the DOGE analysis. The response from the AI frequently cited these reasons as the justification for munchability. Nearly every justification included a form of the phrase “direct patient care,” and in a third of cases the model flagged contracts because it stated the services could be handled in-house.

The poorly defined requirements led to several contracts for VA office internet services being flagged for cancellation. In one justification, the model had this to say:

The contract provides data services for internet connectivity, which is an IT infrastructure service that is multiple layers removed from direct clinical patient care and could likely be performed in-house, making it classified as munchable.

Despite these instructions, AI flagged many audit- and compliance-related contracts as “munchable,” labeling them as “soft services.”

In one case, the model even acknowledged the importance of compliance while flagging a contract for cancellation, stating: “Although essential to ensuring accurate medical records and billing, these services are an administrative support function (a ‘soft service’) rather than direct patient care.”

Shobita Parthasarathy, professor of public policy and director of the Science, Technology, and Public Policy Program at University of Michigan, told ProPublica that this piece of the prompt was notable in that it instructs the model to “distinguish” between the two types of services without instructing the model what to save and what to kill.

The emphasis on “direct patient care” is reflected in how often the AI cited it in its recommendations, even when the model did not have any information about a contract. In one instance where it labeled every field “not found,” it still decided the contract was munchable. It gave this reason:

Without evidence that it involves essential medical procedures or direct clinical support, and assuming the contract is for administrative or related support services, it meets the criteria for being classified as munchable.

In reality, this contract was for the preventative maintenance of important safety devices known as ceiling lifts at VA medical centers, including three sites in Maryland. The contract itself stated:

Ceiling Lifts are used by employees to reposition patients during their care. They are critical safety devices for employees and patients, and must be maintained and inspected appropriately.

This portion of the prompt attempts to define “soft services.” It uses many highly specific examples but also throws in vague categories without definitions like “non-performing/non-essential contracts.”

Experts said that in order for a model to properly determine this, it would need to be given information about the essential activities and what’s required to support them.

This section of the prompt was the result of analysis by Lavingia and other DOGE staff, Lavingia explained. “This is probably from a session where I ran a prior version of the script that most likely a DOGE person was like, ‘It’s not being aggressive enough.’ I don’t know why it starts with a 2. I guess I disagreed with one of them, and so we only put 2, 3 and 4 here.”

Notably, our review found that the only clarifications related to past errors were related to scenarios where the model wasn’t flagging enough contracts for cancellation.

This section of the prompt provides the most detail about what constitutes “direct patient care.” While it does cover many aspects of care, it still leaves a lot of ambiguity and forces the model to make its own judgements about what constitutes “proven efficacy” and “critical” medical equipment.

In addition to the limited information given on what constitutes direct patient care, there is no information about how to determine if a price is “reasonable,” especially since the LLM only sees the first few pages of the document. The models lack knowledge about what’s normal for government contracts.

“I just do not understand how it would be possible. This is hard for a human to figure out,” Jaquith said about whether AI could accurately determine if a contract was reasonably priced. “I don’t see any way that an LLM could know this without a lot of really specialized training.”

This section explicitly lists which tasks could be “easily insourced” by VA staff, and more than 500 different contracts were flagged as “munchable” for this reason.

“A larger issue with all of this is there seems to be an assumption here that contracts are almost inherently wasteful,” Coglianese said when shown this section of the prompt. “Other services, like the kinds that are here, are cheaper to contract for. In fact, these are exactly the sorts of things that we would not want to treat as ‘munchable.’” He went on to explain that insourcing some of these tasks could also “siphon human sources away from direct primary patient care.”

In an interview, Lavingia acknowledged some of these jobs might be better handled externally. “We don’t want to cut the ones that would make the VA less efficient or cause us to hire a bunch of people in-house,” Lavingia explained. “Which currently they can’t do because there’s a hiring freeze.”

The VA is standing behind its use of AI to examine contracts, calling it “a commonsense precedent.” And documents obtained by ProPublica suggest the VA is looking at additional ways AI can be deployed. A March email from a top VA official to DOGE stated:

Today, VA receives over 2 million disability claims per year, and the average time for a decision is 130 days. We believe that key technical improvements (including AI and other automation), combined with Veteran-first process/culture changes pushed from our Secretary’s office could dramatically improve this. A small existing pilot in this space has resulted in 3% of recent claims being processed in less than 30 days. Our mission is to figure out how to grow from 3% to 30% and then upwards such that only the most complex claims take more than a few days.

If you have any information about the misuse or abuse of AI within government agencies, reach out to us via our Signal or SecureDrop channels.

If you’d like to talk to someone specific, Brandon Roberts is an investigative journalist on the news applications team and has a wealth of experience using and dissecting artificial intelligence. He can be reached on Signal @brandonrobertz.01 or by email brandon.roberts@propublica.org.