
A report on May 27, 2026 found that Google’s AI-driven Search produced a series of basic spelling and character — count errors in its AI Overviews. Examples include an Overview that said there are two "P"s in the word "Google," reported "exactly 1 'r' in the word 'poop'," spelled "journalism" as "j-o-u-r-n-a-d-i-s-m," and rendered the U.S. president's last name as "t-r-p-u-m." These visible mistakes matter because they can erode user trust in a flagship product now centered on generative AI.
The errors come as Google refocuses Search to put generative AI front and center, adding AI Overviews that summarize results. The feature has a recent history of mistakes: earlier Overviews once cited satire and Reddit content as if factual, returned unsafe — sounding advice, and a patched bug produced a dictionary — like entry when searching the word "disregard" that read, "Understood. Let me know whenever you have a new prompt or question!"
Google acknowledged the problem in a statement: "Counting within words has been a known challenge for LLMs, and we’re working to fix this particular issue." The company’s admission points to the technical root cause: the Overviews are powered by large language models, which operate on tokenized representations of text rather than on individual letters the way humans do.
Researchers say that tokenization explains why character — level tasks fail. Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, explained that "LLMs are based on this transformer architecture, which notably is not actually reading text. What happens when you input a prompt is that it’s translated into an encoding," and that the model may use a single encoding for a common word rather than encoding its individual letters. Tokens can represent whole words, syllables, or letters, so tasks that require counting characters can be unreliable.
Interpretability researchers are skeptical that there is a neat fix. Sheridan Feucht, a PhD student studying LLM interpretability at Northeastern University, said: "It’s kind of hard to get around the question of what exactly a 'word' should be for a language model, and even if we got human experts to agree on a perfect token vocabulary, models would probably still find it useful to 'chunk' things even further." That tokenizer fuzziness makes spelling and character — count errors an inherent, hard-to-eliminate failure mode.
For builders and product teams, the practical takeaway is clear: these failures underscore the need for validation and guardrails around character — or token — sensitive outputs. While researchers note that LLM utility rarely hinges on perfect spelling, repeated, visible errors in a high-profile product like Search risk undermining user confidence and reinforce the advice to double — check AI-generated content before relying on it.
Sources
Replies (0)
No replies in this topic yet.