First, let me state that I have not read enough on this issue to be confident in weighing in. So consider these thoughts provisional, conversational, oral even in the way that social media (aka twitter) is more an oral than a written form.
I am not finding myself cheering on the authors who are suing over copyright infringement due to AI and LLM training. My books are included in the Book3 database, and it doesn’t bother me.
Why? Well, I’m not sure. But I have been thinking a lot about the beginnings of the publishing industry in this county. It’s entirely rooted in piracy. (A few paragraphs below are from a newsletter I wrote a year ago, if it seems familiar. I am pirating myself!)
The first American publishers did not publish original works for which they had acquired the copyright. They based their business on piracy.
There was no international copyright law, so any book could be reprinted by any American and published under their business name. Publishers would literally race to boats arriving from England to get the copies of British books—of, say, Sir Walter Scott’s Ivanhoe— take them to their print shops, meticulously recreate the text by placing each letter, one by one, backwards, into a compositor stick, then setting plates for each page, page after page, before sending the completed text block to a bindery to be stitched inside a leather cover, and sold.
The most successful early American publishers were not those who signed up the most successful authors or books: they were those who could steal books the fastest. This is how Harper (then J. and J. Harper, and later Harper & Brothers, and now HarperCollins) became successful.
One of these pirated novels became the bestselling American novel for a century. In 1794, Carey stole a novel by Susanna Rowson, Charlotte: A Tale of Truth that had been published in 1791 by a British press.
Charlotte: A Tale of Truth had not been particularly successful, but Carey chose to pirate it—perhaps because it was relatively short, and thus faster and cheaper to make copies and sell. He gave it a new title, Charlotte Temple, and printed 1000 copies. Rowson was not paid anything, for copyright or royalties. In 1801, seven years after his edition was published, Carey sent Rowson a check for twenty dollars along with twenty copies of the book “as a small acknowledgment for the copy right of Charlotte.” Perhaps he did it because it was the right thing, or perhaps he did it because it was a way to prevent other American publishers from putting out their own pirated editions, a practice called “courtesy of the trade,” an informal system amongst publishers to prevent lots of them from pirating the same books.
Most if not all the novels Americans read in the 18th century, and many of them in the 19th, were stolen, as it were, including Jane Austen’s books, the authors receiving nothing for their labor. This would continue long into the 19th century, until an international copyright law was finally passed in 1886.
This is what goes through my head when I think about what is happening with AI, and books, and copyrights, and legal cases being filed. It’s not new, really. Piracy is a given in books, and always has been. Over the past few decades, just about every single ebook can easily be found for free somewhere, having been stolen by someone and put on the web. All my books can be found this way. I even see who is doing it in my Edelweiss account—a few fishy names show up and over and over, downloading Belt titles, clearly to pirate. And as I wrote about last week, professors often teach students how to find pirated versions of course materials.
Look, I’m not saying “training AI/LLMs is fine, let’s move on.” I am not actually arguing anything. I’m, well, pirating a bunch of other information and putting it together into this newsletter.
To wit: I find Cory Doctorow convincing in this piece:
“Workers get a better deal with labor law, not copyright law. Copyright law can augment certain labor disputes, but just as often, it benefits corporations, not workers….AI companies say, ‘You can't use copyright to fix the problems with AI without creating a lot of collateral damage.’ They're right. But what they fail to mention is, "You can use labor law to ban certain uses of AI without creating that collateral damage."
I also think about the authorial parallel to the demands of the WGA regarding AI. Sure, authors could, as has WGA, ask publishers to put in their contracts that the works won’t be used for AI, but publishers are not the entire chain. Distributors have access to ebook files, as do vendors like Amazon. Authors would need to have distributors and vendors also include such provisions in their contracts with publishers for this to have any teeth (I think: I may be missing something here, but I know that Belt ebook files at least are sent to many other companies and those companies can do anything they want with them. Plus, see above, rampant ebook piracy has long been normalized). Now, of course, the elephant in the room in this hypothetical, and thinking about Doctorow, is book authors are not unionized as are WGA writers (though of course there is the Authors Guild). When we focus on labor, and not on copyright, the puzzle pieces shift.
And finally, I think a lot about what we all decided during the theory years in literary study, about originality, and language, and social constructions, and the self, and individuality. I think about Dada, and Warhol’s soup cans, a ‘zines, and how often we have been in embroiled in similar debates. I think about people freaking out over the invention of the printing press, and Socrates’ thoughts about writing per se, and how many of the predictions people made about the digital age were wrong. Gutenberg Elegies, anyone?
I’m not saying anything new (because no one says anything new?). but I am hoping to find and read others who are bringing in (fair use?) other ideas to help us think through this iteration of what could be said to be already-bought battles. And I hope this pastiche leads me to more (un)original things to read and think about as I work towards my own point of view (impossible, anyway) (except for the typos; all those are mine).