In Octavia Butler’s Xenogenesis trilogy, the last human survivors of a global nuclear war awaken from years of sleep to find themselves on board an alien spaceship. Their rescuers, the Oankali, are driven by a primal urge to hybridize with every advanced species they meet, and their various lineages have used the advantages that they’ve picked up from these species to spread throughout the universe. They even have a third gender specially equipped to merge the genetic information of highly different species. Virtually every human in the book is violently repulsed by the possibility of donating their gametes to the Oankali’s cause, even though they’ve offered to rehabilitate Earth in exchange.
The subtext for me throughout the trilogy was that the desire to be malleable and the drive towards some imagined purity are both deeply human traits, with the latter responsible for some of our worst missteps. Our ancestors used every means available to change their bodies in ways that granted new abilities. There’s the theory that language arose in conjunction with the use of fire. Cooked food, the theory goes, allowed our throats to reshape in a way that allowed us to produce complex vocalizations and eventually language, at the cost of being less able to tolerate raw food. We co-evolved with the dogs that we domesticated thousands of years before the advent of agriculture to be more docile and sociable. We still change our bodies today: modern medicine makes human-machine hybrids of us all.
So it’s ironic when people think that we’ve achieved some kind of static form that won’t continue to adapt to and co-evolve with the things and beings in our environment. History bears out that these claims about the purity of people or institutions are usually worthy of our suspicion at a minimum. Purity has been a tool for oppressing out-groups, as well as internal skeptics and naysayers, all for the sake of in-group cohesion.
Image from Jenny Holzer’s “Truisms”
It’s with this perspective that I’ve read scientific journals’ and professional societies’ statements about the role of AI and, in particular, large language models in scientific research. I’ve read several of these, and they’re largely the same in content, even if some are more or less strident.
The general pattern of these statements is to start by acknowledging the wonder and fun that people might feel playing with a tool like ChatGPT for the first time. Then, they mention the concerns around using these tools to cheat and, specifically, how the easy-to-use interface and widespread access enables this cheating. By analogy, this equates to a threat to science, specifically that someone might try to “deceitfully pass off LLM-written text as their own.” They end by listing their requirements around explaining your use of AI and not listing it as an author.
The arguments these statements use deserve a closer look. First, we see several making the argument that AI can’t be original. “For the Science journals, the word ‘original’ is enough to signify that text written by ChatGPT is not acceptable: It is, after all, plagiarized from ChatGPT,” writes Science editor-in-chief, H. Holden Thorpe. The board of the World Association of Medical Editors (WAME) goes farther. “Since chatbots are not conscious,” they write, “they can only repeat and rearrange existing material. No new thought goes into their statements: they can only be original by accident.”
There are several possible definitions of originality at play here. The first seems to focus on the idea of originality as conjuring a sequence of words ex nihilo. But it’s not clear to me why there’s scientific value in every segment of a text’s being completely original. Stock phrases can help with understanding and, really, there are only so many completely novel ways to describe certain procedures. It’s also perfectly acceptable to quote other authors in our publications, even if those publications are under copyright. If the issue is that using text from ChatGPT is automatically plagiarism since ChatGPT can’t be an author, then they’re using a weird sort of circular logic. And if the issue is just that it’s not generated by a human, then this would exclude automated grammar checking tools and autocomplete.
The definition of originality used by WAME focuses, bizarrely, not on the value of an interpretation, but on its contemplation by a sentient author. The reader is strangely absent from their statement and any utility for further research or understanding that might come from an automated interpretation is seemingly nullified by its “accidental” nature, as if accidents haven’t played an important role in science.
Next, all the statements I’ve read have indicated that AI can’t be an author because it can’t be accountable for its work. Most seem to leave the notion of accountability as a rather vague moral ideal that everyone should understand, but WAME specifies that AI’s inability to be accountable stems from its inability to understand and sign legal documents such as conflict of interest statements.
Finally, and even more vaguely, several write that ChatGPT “threaten[s] transparent science.” It is unclear what is meant by this, although the threat seems to come from the idea that researchers might not disclose their use of ChatGPT. Many things threaten the reproducibility and transparency of scientific research if misused or not reported but aren’t banned. In wet labs, for example, reagent types or organismal strains could be materially important for anyone seeking to reproduce or even interpret a given study, but of course reagents and organisms aren’t prohibited. More importantly, these can affect the reliability of the knowledge itself, while the objection to AI seems to be much more focused on its role in creating the manuscript.
Journals are treating AI’s advancement as an emergency because, they say, it lacks originality, can’t be accountable for its work, and might not be transparent. But these ideals are in shambles in scholarship today, due in large part to many journals’ own actions. Originality is overemphasized, and only the most striking findings are published in prestigious journals. This leads to p-hacking, disincentivizes replication, and contributes to duplicated effort by not allowing null findings to be discovered. Accountability for research is strained from every side. Gift authorships are more common than ever, and journals have done little to speed up the process of retracting fraudulent research. Nor have they worked to improve transparency by enforcing data and code availability standards.
All of these issues are getting more attention than ever, and experimentation with alternative models of knowledge dissemination has begun to flourish. And so it’s convenient that ChatGPT has come along at this time, just as Martin Shkreli was the greatest gift that the pharmaceutical industry could have hoped for. Both offer the establishment players a chance to distinguish their own actions from those of a suspicious newcomer. This isn’t to say that large language models can’t be abused or that Martin Shkreli has always behaved admirably. I mean to say that the line between scholarly journals and AI, or the pharmaceutical industry and Martin Shkreli is blurry.
We can see how these policies try to emphasize sharp distinctions, though. The etymology of the word “integrity,” which was used in several, is instructive. The prefix in- means “not” and is attached to the proto-Indo-European root tag, which means “touch” and is shared with words like “attain,” “contact,” and “tangible.” So for science to have integrity means that it’s untouched, pure, self-contained.
And what would be touching science? Yes, there’s the inhuman or the automated. Science is a “human endeavor,” they say, and machines must remain tools. This forestalls the possibility of allowing ourselves and of allowing science to be reshaped by the new possibilities that our machines offer us. But I’m not convinced that humans themselves don’t make a lot of the same errors that AI can. ChatGPT, its creators say, “sometimes writes plausible-sounding but incorrect or nonsensical answers.” In moments of confusion or inattention, I’ve done the same. I’ve also trusted my collaborators’ work when they show me the results the results of their analyses, even when they’re coded in a language I’m not familiar with — am I not able to be accountable for that work? To paraphrase B.F. Skinner, “the question is not whether a computer can be original, accountable, and transparent, but whether a human can.”
But it’s not just AI that can pollute science in this conception, it’s also the uninitiated masses. AI makes code, analysis plans, scholarly language available to non-specialists at the push of a button. To many, though, science is a kind of priesthood as much as it as a profession, and they take its initiation rites very seriously. We see this almost any time an amateur researcher starts becoming well-known.
Having policies about how AI can be used in knowledge creation is sensible, but they need to be created in the service of scientific progress and not the journals’ interest in maintaining their own prestige. Including information on the models, versions, and prompts used to generate ideas, code, or text is not an unreasonable expectation. But we shouldn’t stop there with our calls for transparency. What good does it do us if we know that ChatGPT trimmed 20 words off an abstract if we can’t get access to the data underlying a paper’s analyses? Nor should we be using restrictions on AI to limit how curious amateurs can share their work. With the automation of coding, writing, and interpretation, we need to be experimenting with ever more radical ways of generating reliable scientific information, not reinforcing the borders of our existing horizons.
As someone who works as a researcher on LLMs I can say that a LLM is definitely capable of originality. There are an enormous number of points on an LLM’s manifold of possible outputs that have never been articulated before. Some of them won’t be useful, but some of them will. It’ll take a human to interpret which are useful and which aren’t. Working with this technology in a symbiotic way is going to become more and more important in knowledge sectors.
The other thing that people aren’t bringing up is the way that the parallelization of transformers (as opposed to LSTMs) has allowed for massive compute to be thrown at the problem, increasing the cost, and raising the barrier to entry. If LLMs become useful for scientific research this could have the side effect of limiting the number of players able to compete.
As for the consciousness/sentient but, I wrote up a explanation on “Geometric Intuition for why ChatGPT is not Sentient”: https://taboo.substack.com/p/geometric-intuition-for-why-chatgpt if anyone is interested.
Do you have a concrete example of using a LLM to help with scientific work? People keep talking about how this will change people’s work, but I haven’t seen many concrete cases yet.
I appreciate you moving this conversation along. I am curious to see how the plays out and who the winners and losers will be. At the end of the day, I want to see how to advance our expertise and the quality of our research.