When the writer Rebecca Forster first heard how Google was using her work, it felt like she was trapped in a science fiction novel.
â€œIs this any different than someone using one of my books to start a fire? I have no idea,â€ she says. â€œI have no idea what their objective is. Certainly it is not to bring me readers.â€
After a 25-year writing career, during which she has published 29 novels ranging from contemporary romance to police procedurals, the first instalment of her Josie Bates series, Hostile Witness, has found a new reader: Googleâ€™s artificial intelligence.
â€œMy imagination just didnâ€™t go as far as it being used for something like this,â€ Forster says. â€œPerhaps thatâ€™s my failure.â€
Forsterâ€™s thriller is just one of 11,000 novels that researchers including Oriol Vinyals and Andrew M Dai at Google Brain have been using to improve the technology giantâ€™s conversational style. After feeding these books into a neural network, the system was able to generate fluent, natural-sounding sentences. According to a Google spokesman â€“ who didnâ€™t want to be named â€“ products such as the Google app will be â€œmuch more useful if they can capture the nuance of language betterâ€.
For the moment, the research is just a â€œproof of conceptâ€, the spokesman continues via email, but these methods â€œcould help Google understand and produce a broader, more nuanced range of text for any given taskâ€.
â€œWe could have used many different sets of data for this kind of training, and we have used many different ones for different research projects,â€ he adds. â€œBut in this case, it was particularly useful to have language that frequently repeated the same ideas, so the model could learn many ways to say the same thing â€“ the language, phrasing and grammar in fiction books tends to be much more varied and rich than in most nonfiction books.â€
The only problem is that they didnâ€™t ask. The Google paper [PDF] says that the novels used in this research were taken from â€œthe Books Corpusâ€, citing a 2015 paper by Ryan Kiros and others [PDF] which describes how the authors â€œcollected a corpus of 11,038 books from the webâ€, describing them as â€œfree books written by [as] yet unpublished authorsâ€. Itâ€™s a collection that has been used by other researchers working in artificial intelligence and which is currently available for download in its entirety from the University of Toronto.
Forster says that she â€œalways appreciates an interesting use of wordsâ€, but while Hostile Witness is available to download for free, no one asked her permission to use her novel as raw material to train a computer.
â€œPerhaps Iâ€™m still thinking in the old way, that a reader will read my book â€“ it didnâ€™t even occur to me that a machine could read my book. What I found curious was that these were referred to as â€˜free books written by as yet unpublished authorsâ€™ because my state is very different,â€ she says.
Like many of the novels in the Book Corpus collection, the edition of Hostile Witness used in the research was published on Smashwords and includes a copyright declaration that reserves â€œall rightsâ€, specifies that the ebook is â€œlicensed for your personal enjoyment onlyâ€, and offers the reader thanks for â€œrespecting the hard work of this authorâ€. While Forster says sheâ€™s no lawyer, the â€œspirit of this declaration is clear â€“ you hope that your work would be respected by readersâ€.
â€œI take great pride in my craft, and perhaps it was chosen because of that. Which would be great. Or perhaps it was chosen because it was there, because it was free?â€
Another writer whose work has been used in the Google Brain research is Erin McCarthy, the author of more than 28 novels. The first volume of her Fast Track series, published by Penguin Random Houseâ€™s Berkley Books imprint, is also available for free online, but McCarthy says that Google didnâ€™t get in touch with her or ask for permission to use Jacked Up in their research into AI. Sheâ€™s fascinated to hear that romance novels are being used to improve the search conglomerateâ€™s ability to speak.
â€œThere is a reason they are the bestselling genre in the US and I believe itâ€™s because they feel conversational themselves,â€ McCarthy says. â€œItâ€™s real life turned up a notch. Realism overlying a fantasy.â€
The flow of the dialogue is very important, she continues. â€œI am very cognizant of using modern diction and age-appropriate word choices. If my female character is 24 sheâ€™s not going to speak in a formal manner. Conversations between the hero and heroine have realistic word choices, but there is additionally an element of fantasy there. What they want a hero to say, but what might not actually occur in real life. Thatâ€™s what readers want and expect from a romance novel.â€
McCarthy isnâ€™t sure how to respond to the idea that her work has been used for an entirely different purpose to the one she intended, a purpose that may result in services to make the tech giant a lot of money.
â€œItâ€™s hard to gauge the use of my work and the exact purpose for its use without having seen it in action,â€ she says. â€œMy assumption would be they purchased a copy of the book originally. If they havenâ€™t, then I would imagine the source of the content, as intellectual property, should be properly attributed and compensated for the general health of the creative community.â€
Far from offering proper attribution or any compensation, the Google paper avoids any suggestion that the novels used in the research were written by real people, describing the books only as â€œa collection of text from 12k ebooks, mostly fictionâ€.
Forster is equally adamant that writers whose work has been used to gain a commercial advantage should reap a portion of the rewards, but isnâ€™t holding her breath for any payment.
â€œIf thereâ€™s one thing thatâ€™s niggling at me itâ€™s that I would have liked to have known,â€ she says. â€œWith all the technology at their fingertips, then it wouldnâ€™t have been too hard to let everyone know.â€
According to Mary Rasenberger, executive director of the Authors Guild, this â€œblatantly commercial use of expressive authorshipâ€ comes as no surprise. â€œWeâ€™ve seen this movie before.â€
The Guild has been in dispute with Google since 2005, arguing that the companyâ€™s project to digitise library books was a â€œplain and brazen violation of copyright lawâ€. Google Books won in 2013, with the district court ruling that â€œall society benefitsâ€ from the project, a decision that the supreme court declined to review earlier this year.
â€œWhy shouldnâ€™t authors be asked permission, or even informed â€“ not to mention compensated â€“ before their work is used in this manner?â€ Rasenberger asks. â€œThereâ€™s no doubt the company has the means to do so.â€
Google wouldnâ€™t say whether getting hold of 11,000 authors was beyond their capacities, or if they have any plans to reward the writers, or if the people whose expertise was harvested to train their network were ever considered as individuals. While attribution â€œisnâ€™t requiredâ€, the spokesman says via email, â€œthe researchers clearly identify where they got the dataâ€.
â€œThe machine learning community has long published open research with these kinds of datasets, including many academic researchers with this set of free ebooks â€“ it doesnâ€™t harm the authors and is done for a very different purpose from the authorsâ€™, so itâ€™s fair use under US law.â€
But Rasenberger isnâ€™t convinced.
â€œThe research in question uses these novels for the exact purpose intended by their authors â€“ to be read,â€ she argues. â€œIt shouldnâ€™t matter whether itâ€™s a machine or a human doing the copying and reading, especially when behind the machine stands a multi-billion dollar corporation which has time and again bent over backwards devising ways to monetise creative content without compensating the creators of that content.â€
Rasenberger adds that nobody knows how books will be read or used in the future, which is why the Authors Guild is proposing that digital uses should be allowed under a licensing system. But for the moment, â€œGoogle is extracting immense value from the creative efforts of thousands of authors and looking the other wayâ€.
For Forster, the lack of any proper attribution speaks volumes. â€œIf theyâ€™re not mentioning the authors,â€ she says, â€œthen maybe theyâ€™re not thinking of it in terms of it being someoneâ€™s work.â€
She never imagined her work would wind up as being part of someone elseâ€™s dataset, as raw ingredients to satisfy a machineâ€™s hunger for information, but sheâ€™s â€œbeen around long enough to know that what you hope for isnâ€™t always what you getâ€.
â€œI would have loved to have been part of the discussion of this project, and to have known how it was going to be used,â€ she says. â€œBut Iâ€™d also like to be thought of as intelligent enough to be able to make a decision about the end product.â€