You may be aware of the firestorm of protest from authors and publishers, including two lawsuits, over Google's new Print Library Project. Here are some allegations in "Reining in Google" by Pat Schroeder and Bob Barr:
Internet behemoth Google, plans to launch their Library project in November. It plans to scan the entire contents of the Stanford, Harvard and University of Michigan libraries and make what it calls "snippets" of the works available online, for free.
The creators and owners of these copyrighted works will not be compensated, nor has Google defined what a "snippet" is: a paragraph? A page? A chapter? A whole book? Meanwhile Google will gain a huge new revenue stream by selling ad space on library search results. Selling ads on its search engine is how Google makes 99 percent of its billions.
Not only is Google trying to rewrite copyright law, it is also crushing creativity. If publishers and authors have to spend all their time policing Google for works they have already written, it is hard to create more. Our laws say if you wish to copy someone's work, you must get their permission. Google wants to trash that.
Because I wrote an article for LWN in September about this project and the Author's Guild lawsuit against Google, I know that those accusations are not factually true. For that reason, I decided to republish the information here, because it explains how this project really works and what the legal arguments are on both sides. Google Print Library does not work at all the way it is described by Ms. Schroeder and Mr. Barr, as you will see. While there are arguments to be made on both sides, it is vital in any discussion to be accurate on the facts. So with that goal in mind, here is the result of my research on how Google Print Library really works.
The Author's Guild Sues Google
Lawyers, like the rest of us, are reacting with great interest and some passion to the Author's Guild's copyright infringement lawsuit against Google over its new Google Print Library Project, by which Google plans to scan books from the libraries of Harvard, Stanford, Oxford, the University of Michigan, and the New York Public Library and make them searchable by keyword. Google describes the project's goals like this:
The Library Project's aim is simple: make it easier to find relevant books. We hope to guide users to books — specifically books they might not be able to find any other way — all while carefully respecting authors' and publishers' copyrights. Our ultimate goal is to work with publishers and libraries to create a comprehensive, searchable, next-generation card catalog of all books in all languages that helps users discover new books and publishers find new readers.
The Author's Guild describes it differently. To them, it's massive copyright infringement, pure and simple. The lawyers are trying to figure out who is right and which side is more likely to prevail, to the extent anyone can predict a fair use case, but there are bigger issues raised by this litigation. Here's the complaint [PDF] and Google's public statement in response. If you'd like to follow the lawyers' discussions, here are some places where you can do so: Susan Crawford's blog, William Patry's The Patry Copyright Blog, and Eric Goldman's Technology and Marketing Law Blog, and here's Andrew Raff's excellent collection of attorney reactions on IPTAblog. You might enjoy reading Tim Bray's thoughtful take on the lawsuit, looking at it from a publisher's point of view.
How Google Print Library Works
What exactly is Google doing with Google Print?
First, what *isn't* it doing? It isn't making copyrighted books available cover to cover against anyone's will. There are three parts to Google Print. One, Google makes books available in their entirety only when the books are in the public domain, like Project Gutenberg has done for years. Second, when publishers or authors agree, it makes sections available, the page the keyword appears on and a few pages on either side, but that is a separate facet of the project, the Google Print Publisher Program. The one the Author's Guild is fighting over is the third part, Google's Print Library Program, and for that Google will show only a few sentences on both sides of the keyword searched for, and not necessarily complete sentences. You never see a full page, let alone an entire book. You will also find bibliographic information and where you can find related information on the web. In all cases, you will also be directed to nearby libraries and bookstores where the book is available for purchase or loan, including second-hand bookstores for out-of-print books.
Screenshots of the three different offerings can be viewed here. And Google's Common Questions about the Google Print Library Project says that Google Print is "designed to help you discover books, not read them from start to finish. It's like going to a bookstore and browsing – only with a Google twist."
The legal arguments basically go like this. On the Google side, the clearest arguments are presented by EFF's Jason Schultz, who explains the four fair use tests, Jonathan Band's paper, "The Google Print Library Project: A Copyright Analysis" [PDF], and Susan Crawford on her blog, all of whom essentially say that copying entire books in order to make a digital keyword-based catalogue is transformative and is fair use. Google isn't copying more than is necessary, they argue, because you can't search for keywords unless you have the whole book available. And anyway, where's the harm to the market? They cite the Kelly v. Arriba Soft case [PDF], in which the defendant made thumbnails of other people's photos available online in response to search requests, with links to the original works, if anyone wanted to purchase them. Arriba's use was ruled fair use, despite the fact that not only was an entire copy of the original made, a smaller version of it, in its entirety, was made available to the public. Google is only showing a sentence or two, not the entire book, for works where the author hasn't given approval to show more. If Arriba is fair use, why isn't Google Print's Library Project also?
If you wrote an article for a magazine and quoted a sentence or two, likely no one would complain, because it's so obviously fair use, so why is it a problem for Google to do the same thing with books? And what is the difference between Google collecting the world's content made available on the Internet so as to make it searchable and collecting keywords from the world's books? Copyright holders can opt out. If Google Print violates copyright law, why doesn't Google, period?
A common theme on both sides of the argument goes like this: Google has had a fantastic idea, one that can benefit the human race, and almost everyone hopes there is a way for them to do this. It's just a question of how to do it right. Google is shouldering the expense and effort of making a library card catalogue, so to speak, of the world's knowledge and offering it free to the world. Can anyone *not* want that to happen?
Authors should want to be included so they can be found. The world does its research now predominantly online, and authors, particularly authors whose works aren't selling like hot cakes, have everything to gain from being included in Google Print.
Author's Guild's Side
On the Author's Guild side is the argument that authors have the right to decide when others may or may not copy their works. It's different with Google making the web's content available, because a license can be inferred when someone puts content on the web and doesn't take steps to ban Google and other search engines with a robots.txt file. There is no equivalent implied permission from the authors of these books.
Copyright law gives copyright holders the right to make copies, period, and no one else can do so without permission. Libraries don't own the copyrights to these works, so they can't give permission, it is argued. Google will violate copyright law, no matter how little it shows the world, because it will make copies and store them on its servers. The onus is on Google to contact all the authors and publishers and get permissions, one by one, they say. If that is so onerous and costly that Google Print Library can't happen, so be it. The law is the law. This side cites the MP3 decision [PDF].
We might wish it could happen, some on that side say, but copyright law is what it is, so it can't. Some even predict that this litigation will shut down search engines like Google's. A few hope that happens. Some of the complaints about Google Print seem more emotional than based on fact. One comment on Boing Boing by a publisher is particularly interesting:
Google Print for Libraries has two pretty major flaws. One being giving a digital copy of all of our works to the participating libraries where they will then most likely be used in e-course reserves without any compensation to either author or publisher. University Libraries have an awful track record at compensating for e-course reserves and post our content frequently without any restrictions or security.
The second being Google will be profiting (through GoogleAds) on this content again without compensating the authors or publishers. Fair use should exclude commercial use. Even Creative Commons licenses (which I grant to my flikr account) gives you that option.
If we expect the production of good scholarship to be a viable, it has to be paid for somehow.
A little more accurate information may help calm these fears. First, fair use doesn't exclude commercial use. I can write a parody, for example, of your book, even if you don't want me to, and I can sell my parody. Second, take a look at the terms of the Google-University of Michigan agreement [PDF], which is available on the university's website, and you will see that Google has bound the University, and any of its partners, to limitations on access and use. Further, should there ever be a dispute between an author and Google about including a work, the work can be removed by Google, and the University must then follow suit. Authors can always opt out.
What about the allegation that Google will make money from this project from ads? Google says there won't be any ads on the books scanned from a library. This is important, because the Complaint specifically alleges that Google will be profiting by ads: "4. Google has announced plans to reproduce the Works for use on its website in order to attract visitors to its web site and generate advertising revenue thereby." As for the links to bookstores, Google says that the links they will provide will not be "paid for by those sites, nor does Google or any library benefit if you buy something from one of these retailers." Clause 4.3 of the agreement says that the service will be provided "at no direct cost to end users".
While the Author's Guild makes much of Google allegedly profiting off of their work, a strong argument can be made that it's the other way around, since Google is providing a new way for readers to discover their members' books, even those on the deep, deep backlist, as you can see in this example.
Are There Problems with the Complaint?
Then there are some attorneys already pointing out flaws, procedural defects, they believe they see in the Author's Guild complaint. It is supposedly a class action, but some see a problem with class certification. The complaint defines the class as all persons or entities that hold the copyright to a literary work that is contained in the library of the University of Michigan.
Class action lawsuits are supposed to represent the group the few who are named allegedly represent, but Lawrence Solum, who is an author, a member of the plaintiff class in the sense that he has several works in the University of Michigan's library, opposes the lawsuit and says he will be harmed if the Author's Guild prevails:
I have a very strong objective interest in Google Print succeeding -- because as a scholar, I benefit from the dissemination of my works and because reaching agreement with Google will be costly to me and Google, essentially killing the project. A substantial intraclass conflict of interest destroys "adequacy of representation," making class certification inappropriate, both under the federal rules of civil procedure and under the due process clause of the fifth amendment of the U.S. Constitution. . . . Pro-bono representation for intervenors opposing certification, anyone?"
Is it Copying That Causes Harm, or Distribution?
Think about brick and mortar libraries. Suppose I were a librarian. I want to catalogue every book in my library and do it by keyword, so readers can come to the library and look up information by keywords on index cards that I laboriously file alphabetically in file cabinets. Each keyword will show you where in that library you can find a book that uses that keyword, with the page given, and additionally tells you where, in nearby bookstores, you can buy the book.
Would my painstaking work be a copyright offense? It's laughable to even think of it.
Now, suppose I take all my index cards, and I laboriously hand type them into a computer. I have a computer database now, listing every keyword. Now have I violated copyright? Again, it doesn't pass the laugh test, does it?
But what if I realize that instead of the hand method, all I have to do is scan in the whole book and then pick out keywords by algorithm. Now am I a copyright infringer? If so, why? On the technicality that I had to scan in the whole book, thus making a copy, in order to break it down into keywords for my card catalogue of my library's contents? Purists for the law will say, Yes. You are an infringer, because you made a copy.
And they are right. You did. But exactly who is harmed by this scenario? The end result is exactly the same, whether I do the work by hand or by computer, except that Google deliberately limits how much I can see, whereas in the library, the keyword would lead me to the entire book, which presumably I could borrow, take home and scan or Xerox myself, if I don't care about copyright.
If the copy merely stays on Google's servers, used only for making a digital card catalogue, in what way is the author or the publisher harmed? Have they lost any sales?
Google isn't displaying the works in their entirety on its website, as the Author's Guild seems to imagine. It isn't selling the books or offering them for download. It is offering a tool to search books. Where is the harm to the market? Libraries have special rights under Copyright Law. Why shouldn't this project?
The Big Picture Questions
For those of us who are not lawyers, our dominant reaction to this lawsuit is probably that if Google Print Library violates copyright law, somebody needs to change the law.
This litigation raises some important questions: What is a library in the digital age? What is a book? Is Google Print going to do away with books as containers of knowledge, replaced by searchable databases? What about this litigation's effect on copyright law in the US? Is it possible, as one comment on the Conglomerate blog suggests, that if it wins, "Google may be planting the seeds of the destruction of copyright as we know it"?
Computers are, under current law, the ultimate infringers, in the sense that you can't read anything on a computer without making a copy in RAM. There is, in short, no way to avoid making a copy, if you access at all. It's the gotcha of copyright law in the digital age, and at some point, some say, we need to think about that issue and decide what to do about it. If you want the hairs on your head to stand straight up, note the lack of comprehension of the tech involved in using a computer by reading the MAI SYSTEMS CORP. v. PEAK COMPUTER, INC., 991 F.2d 511 (9th Cir. 1993) decision: "After reviewing the record, we find no specific facts . . . which indicate that the copy created in the RAM is not fixed."
Susan Crawford explains:
All computers do is copy. Copyright law has this idea of strict liability -- no matter what your intent is, if you make a copy without authorization, you're an infringer. So computers are natural-born automatic infringers. Copyright law and computers are always running into conflict -- we really need to rewrite copyright law.
Ernest Miller and Joan Feigenbaum, in their very interesting paper "Taking the Copy out of Copyright" [PDF], suggest that we drop the copy from copyright law and focus on distribution instead. After all, it's distribution that harms authors and publishers, not copies on a Google server no one can see or access but Google.
We watched Napster get hogtied, killed, cremated and scattered to the winds, and most of us were sad that the law was trying to snuff out a great new idea because the courts seemed not to grasp the tech and the real potential for businesses founded on this new technology.
But the world's books? Should the law block a new way to research and find books on any topic any human has ever written about, granularized down and searchable by keyword, a way to to find specific books by keyword in the finest libraries in the world, without having to travel there physically?
Larry Lessig puts it like this:
Google Print could be the most important contribution to the spread of knowledge since Jefferson dreamed of national libraries. It is an astonishing opportunity to revive our cultural past, and make it accessible. . . . Google wants to do nothing more to 20,000,000 books than it does to the Internet: it wants to index them, and it offers anyone in the index the right to opt out. If it is illegal to do that with 20,000,000 books, then why is it legal to do it with the Internet? The "authors'" claims, if true, mean Google itself is illegal. Common sense, or better, commons sense, revolts at the idea. And so too should you.
The Author's Guild has only 8,000 members. I say only, because Groklaw has more members than that. The value to the public of Google's Print Library collection so far outweighs the value of one book to one author or even 8,000 books to 8,000 authors, that it is hard to comprehend how any law could be permitted that could allow such a result as shutting down Google on the demand of those 8,000 authors. Have we gone stark raving mad?
Copyright law is designed to protect authors, yes, but it is supposed to do so in a balance with the public good. Copyright law's purpose is to further the public good by promoting more works of authorship, so as to make knowledge available. When did that part of the law's purpose get forgotten? Protecting authors' rights is a means to the end of making knowledge more freely available, which is exactly what Google is trying to do.