AI in the Wild: I Guess We’re Just Making Up Words Now?

Jun 05, 2024

Ask a librarian - How reliable are AI tools like ChatGPT, AI Overviews, and CoPilot?

Not reliable at all. We know that GEN AI makes up fake facts – creates perfectly realistic citations to fake articles – and hallucinates everything from recipes to obituaries.

And more. Now, apparently, Chat GPT and other text-generating tools are pulling words out of their (virtual) a$$.

A tech-savvy journalist discovered that AI now makes up words. Try searching for the word "adapitates" in any major search engine. Dozens of results come-up – but here’s the problem - “adapitates" is not a real word.

It appears in no dictionary – online or in print. A quick search of Google Ngram Viewer – an online search utility that displays how often words or phrases (ngrams) have occurred in the infoverse – shows no instances of this word.

Why is this happening? Large language models (LLMs) are trained to predict patterns in text and in language. But first they break the text into the smallest possible unit, the token. Tokens are basically pieces of words.

According to Emmanuel Maggiori – author of the book Smart Until It's Dumb:

AI language models generate text by selecting words from an internal dictionary, one by one. This dictionary contains common English words and common pieces of words, like "ish", among other things.

"Adapitates" for example, does sound like it could be a real word. ChatGPT would generate it by selecting two elements from its internal dictionary--adap and itates.

Plug the fake word Adapitates into a “tokenizer” and you will see the tokens that make up the word.

According to Maggirori, these models were optimized to create plausibly sounding text, so it’s more than possible that they may sometimes combine elements from their dictionaries to generate sensible sounding words even if they don't really exist.

In other words – fake words – like fake facts are baked into AI text generation. And this language is crossing over into the wider information eco-system. Pseudo-language is being fed back into the LLM’s themselves, of course. But how soon before we forget where these strange, but familiar, terms originated? It almost makes sense. I think I know what it means. Am I in an episode of Black Mirror?

We’re just going to have to adapitate.

Ask A Librarian

Discussion about this post