So what if I AI myself once in a while? What I do in the privacy of my own home is my business.
Well, I guess it became your business, too, starting with this post from last week.
If you haven’t read that one, it explores the outcome of asking a generative AI tool to expound on a topic about which I am an expert: namely, my own work as an author.
What it got right and what it got wrong are noteworthy, and I catalog both in that post, which I encourage you to check out before we get into analyzing—
What the results of our experiment tell us about how large language models (LLMs) work
What this means for the trustworthiness of generative AI and LLMs
How we might still make good use of them as authors and technologists
Mistakes Were Made
For the sake of re-familiarization, let’s list some of the mistakes that came out of my conversation with ChatGPT, which started when I asked it whether it was familiar with the works of author Ryan R. Campbell. I’ve grouped these mistakes into two categories: one of which I’ll call data misinterpretation and one I’ll refer to as mistaken token generation.
Errors from Data Misinterpretation
describing me as a fantasy and horror author
calling my sci-fi series Imminent Dawn
referring to that series as a trilogy
claiming Event Horizon was published in 2020
Mistaken Token Generation
calling that same sci-fi series referenced above EMPRESS
Dr. Wyatt “Halberd”
Okay, but what the heck do those categories of mistake mean, and why did I divide them up that way? We’ll rely on what we know about how LLMs work in the next section to offer up some hypotheses.
How Those Mistakes Were Made
Data Misinterpretation
With respect to the errors I chalk up to data misinterpretation, these all have their origins in the relatively—emphasis on relatively—sparse or ambiguous data available to the LLM about my work.
Even though I went on a real blogging streak from 2016-2020 (especially in 2019 and 2020) and my author website was linked to via countless other sites and blogs, the corpus of data available to the LLM specifically about me and my books is infinitesimally small relative to the full sum of online conversation about related topics during that window.
That isn’t to say no one was talking about my work, but rather that compared to the most popular or most blogged about authors in my genres, the LLM simply didn’t have as much data to pore over to increase the probability of correct assertions on a consistent basis.
Generic Genre Errors
Looking at the errors in this bucket more specifically, incorrectly including fantasy and horror as genres in which I write is likely due to educated guesses the model is trying to make based on 1) other authors it recognizes and 2) larger linguistic forces.
With respect to the former, science fiction and fantasy are genres generally viewed as complementary; there are a number of writers and readers who produce or consume both to some extent.
Then there’s the linguistic angle: there’s something satisfying about the phrase “science fiction and fantasy” compared with “fantasy and science fiction.” This has its origins in cultural and linguistic phenomena like order of precedence and binomial ordering. We won’t go into what each of those are here (in this post, anyway), but you can learn (a lot) more at the links provided.
The reason I bring this up is because, as we’ll discuss more in the section below about mistaken token generation, LLMs are probabilistic; they attempt to guess or predict what “should” come next in a string of text based on observations they make about their training data.
As a result, it’s not terribly surprising that after writing “science fiction” in a list, it would simply follow up with the word “fantasy,” which it did in the list of genres it spat out about me. Probabilistically, “fantasy” isn’t a bad guess.
Where including horror as a genre in which I write is concerned, though, that’s another matter. I imagine it included this as there may be some mentions out there about elements of body horror in the EMPATHY series or because I did host some author spotlights back in the day that welcomed in authors who write firmly in the horror camp.
The LLM may therefore have had just enough of an association of my work with horror that it felt confident to erroneously include it as one of the genres in which I write.
Serial Series Errors
Referring to the EMPATHY series as the Imminent Dawn series is an easy enough mistake to make when we consider, again, the relative lack of data available to the LLM about my work or this series.
Out there on the interwebs, the word “series” appears often next to the title of the first book in the series, Imminent Dawn, as when it was released, and subsequently after its release, I really wanted to drive home the fact that this was a book meant to provide a foundation for much more to come.
In fact, take a look at that last sentence and how adjacent “series” is to Imminent Dawn. The sentence as written is how it came to me naturally, and I have little doubt that I did this regularly when blogging about the book in the past.
In this case, then, the LLM just seems to have made a bad—but not unreasonable, all things considered—assumption.
Where referring to EMPATHY as a trilogy and claiming Event Horizon was released in 2020, these are actually pretty fair mistakes to have made. I wouldn’t blame a human for arriving at these conclusions because, well, I know some humans who have.
As far as the internet—and therefore, the LLM—knows, only three books were ever formally announced as under contract and due for publication. By way of contrast, there was a relatively very small handful of blog posts and pages on which I laid out the full scope of the series (5-7 books).
And it is true that Event Horizon was meant to be published in 2020. When I pulled that book before publication, however (more details on this in the previous post in this series), I don’t think I made much in the way of a formal announcement outside of maybe a social media post or two.
Given this, I can’t blame the LLM—or any human—for, at a quick glance, arriving at these conclusions. It’s what the data would suggest if you only took a cursory glance and had to make snap assumptions probabilistically.
Mistaken Token Generation
The mistakes that fall into this category are, for me, far more fascinating than those described above. Though these errors were far fewer compared to those in the other bucket, referring to the EMPATHY series as the EMPRESS series and misnaming one of its characters as “Halberd” instead of “Halman” actually give us great insight into how LLMs make their probabilistic assertions.
Chokin’ on a Broken Token
Despite what one might guess, LLMs do not generate their assertions word-by-word.
Instead, they generate tokens, which are smaller units of language not dissimilar from the linguistic concept of a morpheme. That is to say, a token could be a whole word like “over” (a word in its own right) or “overconfident” (in which “over” also appears, but instead as a prefix that changes the meaning of the word by virtue of agglutination).
As an LLM builds a response, it evaluates the likelihood of what the next token in a string might be, and it’s clear—to me, anyway—that the EMPRESS and “Halberd” mistakes are examples of misattributed assertions that stem from this tokenized response generation.
The EMPRESS Makes One “Hal” of an Error
With EMP and “Hal” as the initial tokens in their respective words, the LLM had to calculate what the likelihood of the next token in the string would be based on the data available to it.
In my observation, it would not be unreasonable to conclude that EMPRESS is a viable title for a book series, especially if the model already mistakenly believes the author to be one who writes in the fantasy genre!
For “Halberd” versus “Halman,” the explanation is even more simple. “Halberd” is a word that likely appears an impossibly greater number of times than “Halman” does in the LLM’s training data by virtue of the fact that a halberd is a real thing.
In fact, “Halman” is probably only significant in the context of the EMPATHY series and the Arthur C. Clarke reference on which I based Wyatt and his family’s last name (how’s that for an Easter Egg?).
What These Mistakes Mean Going Forward
Look, you don’t need me to tell you that you can’t trust an LLM to get everything exactly correct. Heck, OpenAI even tells you to double check ChatGPT’s outputs, especially for important information.
Now, whether my work is important… that’s another matter, and it’s one I don’t think it fair for me to opine upon. Ha!
With all of this in mind, though, it’s another reminder to double check ourselves—lest we wreck ourselves—when it comes to relying on LLMs to inform our opinions or guide our research.
As for what all of this means in the context of how authors and technologists can use LLMs going forward, well, the experimentation will continue. I have a number of ideas I’m tinkering with for how we might probe these technologies further, and we’ll get into that in subsequent R: On Everything series.
For now, though, how will you use what you’ve learned about how LLMs work to make better use of them going forward? Tell me in the comments below!
Just say “No” to using AI models. I’m a slow adopter of new technology. I’m happy to wait until the technology matures. I’ll let others blaze the trail and learn the hard lessons. I’ll stick with old school for now.
For now, I'm not ready to take the plunge. I'll stay on the sidelines, observing how you and others interact, and learn from your experiences.