A language scientist at the University of Kansas has turned to gibberish as a tool for probing how artificial intelligence handles words. Rather than evaluating ChatGPT through traditional benchmarks, the research relied on deliberately meaningless inputs, so-called “nonwords”, to see where the chatbot aligns with human thinking and where it diverges.
This work, published in the journal PLOS One, is part of a growing area where human-focused cognitive techniques are applied to machines. The study involved four experiments, each targeting different aspects of language processing. In one of these, ChatGPT received old, unused English words, terms that have faded from public use. Interestingly, the system could define most of them correctly, performing better than a typical person likely would. For some words, it struggled. In a few cases, it gave wrong answers entirely, inventing definitions where none were found, a phenomenon often labeled as “hallucination” in AI.
In another experiment, the researchers shifted from English to Spanish. ChatGPT was shown Spanish words and asked to find English words that sounded similar. Its answers showed that it often stayed within the Spanish language, even when the prompt asked for English equivalents. Only when the instructions were made very specific did it adjust its behavior. This pattern highlighted a limitation in the system’s handling of language context, something humans generally manage with ease.
The third part of the research turned the spotlight on how the AI judges the “wordlikeness” of invented terms. ChatGPT was presented with nonwords that varied in how closely they mimicked real English words. It then rated these on how plausible they sounded as English, and separately, how likely someone would be to buy a product named that way. The scores closely tracked what human participants gave in earlier studies, suggesting that the chatbot had internalized patterns in the structure of English, even if it wasn’t explicitly trained for phonological reasoning.
The final segment of the study pushed the chatbot into creative territory. It was asked to generate brand-new words for ideas that don’t have a clear label in English. Many of the results were combinations of existing words or parts of them, producing neologisms that felt intuitive. Some of these, like a term for frustration upon waking or for craving alternating sweet and salty snacks, felt like natural additions to the language, while others showed the limits of the tool’s creative range.
Taken together, the experiments didn’t aim to show whether AI processes language like humans, but instead to map out the differences in how each handles language tasks. Some findings point to areas where AI could help fill human gaps, such as memory for obscure vocabulary, while others make clear that AI often lacks the contextual instincts that people use effortlessly in conversation.
The research underscores a broader point: rather than building AI to mimic human language behavior exactly, it might be more productive to design these systems to support human needs in complementary ways.
Image: DIW-AIgen
Read next: Addictive Digital Habits in Teens May Heighten Risk of Mental Health Problems, Research Finds