May 23, 2026

17 Views

Cisco AI Report Test Shows Faster Drafts Need Humans

Cisco AI incident reports are moving from theory to the security desk. In a May 21, 2026 post, Cisco Talos said an AI assisted tabletop exercise report could cut drafting time by 50 percent, but the same test exposed context drift, inconsistent recommendations, source control problems and a grammar checker that missed too much for production use.

That finding matters because incident response writing is not ordinary office work. A bad phrase in a breach report can send executives toward the wrong reset plan, misstate what happened, or bury the lesson a team needs before the next attack.

Cisco Found Speed, Then Found Drift

Nate Pors, senior incident commander at Cisco Talos Incident Response, described the test as research by the Talos IR AI Tiger Team, not a finished product roll out. The team used a tabletop exercise (TTX, a discussion based security drill built around a fictional incident) because the source material was easier to review than a live forensics file full of timestamps, file paths and raw logs.

The upside was plain. In the Cisco Talos AI reporting case study, the team said test results predicted a 50 percent reduction in total report drafting time, even after counting manual writing for the 10 percent of content that did not fit AI generation and manual editing of the generated draft.

50 percent predicted reduction in total drafting time for the tabletop report.
10 percent of report content still had to be written manually in the test.
Below 50 percent success rate for a separate spelling and grammar checking prompt.
Four inconsistency types identified by the Talos team.

The split result is the useful part. AI can be fast when the job is reorganizing known notes into a known format. It gets risky when the job drifts into judgment, source selection, recommendations or final quality control.

Cisco AI incident report testing shows why human review matters.

The Case Study Was the Easy Version

The test did not ask a large language model (LLM, software that predicts and generates text from a prompt) to investigate a breach from scratch. It asked the model to help turn structured exercise notes into a report. That distinction decides how far the lesson travels.

Talos said the exercise report was a good candidate because the model was mainly rearranging existing notes. A forensic incident report has a harder evidentiary burden. It can include event logs, malware hashes, account histories, command lines, file paths and timing sequences that need precise checking.

Reporting Task	AI Fit	Main Risk	Human Review Burden
Tabletop exercise report	Higher, when notes and sections are standardized	Weak or duplicated recommendations	Confirm that the narrative matches the exercise record
Live forensic incident report	Lower, unless tightly scoped	Wrong timestamp, wrong host, missed artifact or false causal link	Trace every technical claim back to evidence
Executive summary	Useful when source material is complete	Over compression that hides uncertainty	Check that business impact and confidence level are accurate
Grammar and spelling pass	Weak in Cisco’s test	False positives and missed errors	Use a human editor or conventional checker before release

That table is the operating rule for security teams. The safer use case is narrow and repetitive. The dangerous one asks the model to infer facts a responder has not verified.

Four Failure Modes Security Leaders Should Name

Cisco’s report is useful because it gives security teams language for problems many have already seen in AI pilots. The failure is not only hallucination. It can be inconsistency, and inconsistency is harder to catch because one draft can look clean while the next one quietly changes the advice.

The Talos team named four main failure modes:

Research and sourcing drift – the model may pull from different material across runs, which can change the foundation of a report.
Conclusion drift – the same facts can lead to different recommendations, such as broad password resets in one draft and targeted resets in another.
Format drift – reports can change structure, tone and section order unless the output format is tightly specified.
Context pollution – earlier tasks in the same session can bleed into later work, even after source notes are removed.

Context pollution is the one that should make a chief information security officer pause. A report writer can close one document and start another. A chat session may keep enough residue from the prior task to blend unrelated evidence unless the workflow forces a fresh session.

Prompt Controls Shift the Work to Governance

Cisco did not present prompt writing as magic. The better read is that prompt controls can turn a loose AI experiment into a workflow with checkpoints. That makes the process slower than a casual chatbot session, but still faster than hand drafting when the source material is clean.

The Talos team used four controls that deserve a place in any security writing policy:

Break one large prompt into smaller single task prompts, each tied to a report section.
Specify the exact source material the model may use, rather than letting it choose.
Define tone, length, audience and required sections before generation starts.
Embed a report template so static text stays static and placeholders get filled deliberately.

Those controls echo the National Institute of Standards and Technology (NIST, the U.S. standards agency) approach to AI risk. NIST’s AI Risk Management Framework treats trustworthy AI as a mix of valid, reliable, safe, secure, accountable, transparent and privacy aware qualities, not just a model performance score.

take ownership of every word of the final report

Pors used that phrase in the Cisco post, and it is the line security teams should print above the drafting station. AI can prepare a draft. Accountability stays with the author.

The Failed Grammar Prompt Is the Warning Label

The most telling result was not the 50 percent drafting gain. It was the grammar checker. Cisco said a fourth prompt designed to edit a full report for grammar, spelling and similar issues hallucinated grammar problems, missed actual issues and produced a success rate below 50 percent.

That finding cuts against a common office habit: using AI as the final polish after the hard work is done. In incident response, the final polish can change meaning. A suggested rewrite can weaken certainty, add certainty where none exists, or smooth over a technical exception that should remain visible.

The last mile cannot be outsourced if the document may guide legal notices, board briefings, insurance claims or customer communications. A conventional spell checker can still miss things, but it usually does not invent a grammatical fault and then rewrite the sentence around it. A human editor can ask the one question the model cannot own: does this sentence match the evidence?

AI Adoption Is Running Ahead of Security Controls

Cisco’s small reporting experiment lands inside a much larger rush. In the Cisco AI Readiness Index results, Cisco says Pacesetter organizations make up about 13 percent of companies worldwide, are four times more likely to move AI pilots into production and are 50 percent more likely to report measurable value from AI.

Security maturity is not keeping pace everywhere. Cisco’s Cybersecurity Readiness Index page says 86 percent of companies reported AI related security incidents in the prior 12 months, while only 45 percent said they had the internal resources and expertise to conduct comprehensive AI security assessments.

Government guidance is also moving toward process, not hype. The Cybersecurity and Infrastructure Security Agency (CISA, the U.S. cyber defense agency) published the JCDC AI Cybersecurity Collaboration Playbook on January 14, 2025, after tabletop exercises with public and private sector partners. Its focus on information sharing, incident categories and response roles points to the same lesson Cisco found in miniature: the tool only helps when the workflow around it is defined.

Cisco’s own Responsible AI principles list transparency, fairness, accountability, privacy, security and reliability. The reporting test is a practical stress test of those words. If a team cannot name the source, isolate the session, preserve the template and sign the final draft, it is not ready to use AI on a security incident report. If it can, the model becomes a drafting assistant with a short leash.

Frequently Asked Questions

What Did Cisco Test With AI Incident Reports?

Cisco Talos tested whether AI could help draft a tabletop exercise security report from existing notes, using prompts for organizing discussion, polishing recommendations and summarizing the report.

Did AI Cut Cisco’s Incident Report Drafting Time?

Yes. Cisco Talos said the case study predicted a 50 percent reduction in total report drafting time, after including manual writing and editing work.

Why Was the Cisco AI Report Test Risky?

The risk came from inconsistent sourcing, inconsistent conclusions, changing output formats and context pollution between report sessions, any of which could change advice in a security report.

Can Security Teams Use AI for Live Breach Reports?

Security teams can use AI only in tightly scoped ways for live breach reports, because forensic reports depend on precise evidence such as timestamps, logs, file paths and host data.

What Should Companies Do Before Using AI for Incident Reports?

Companies should define approved tools, source rules, prompt templates, session separation, review steps and final human ownership before AI touches incident response reporting.

News, Technology