decoration decoration

When you want to know more...
For layout only
Site Map
About Groklaw
Legal Research
ApplevSamsung p.2
Cast: Lawyers
Comes v. MS
Gordon v MS
IV v. Google
Legal Docs
MS Litigations
News Picks
Novell v. MS
Novell-MS Deal
OOXML Appeals
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v Novell
Sean Daly
Software Patents
Switch to Linux
Unix Books
Your contributions keep Groklaw going.
To donate to Groklaw 2.0:

Groklaw Gear

Click here to send an email to the editor of this weblog.

To read comments to this article, go here
How Not to Get Snookered by Claims of "Proof" of Copyright Infringement - Updated
Sunday, January 23 2011 @ 10:10 AM EST

I guess you heard that Florian Mueller is at it again. He made strong claims of a smoking gun regarding alleged copyright infringement of Oracle files by Google. Well, in the cold light of day, some of the media who printed it without fact checking are now awakening to the news that the news wasn't as reliable or unchallengeable as they assumed.

You'll find corrections now, notably from Ed Burnette at ZDNet, who is a programmer-journalist, and by Paul Ryan at ars technica. That is what journalists are supposed to do, if they see wrong information. It's part of the ethics of being a good journalist, and the other part is to issue corrections when the mistake is your own. To their credit, many journalists have now corrected what they wrote initially.

However some continue to print that that it still means something horrible about Google and the predictions of Google's doom are still in the air. And of course, Florian never quits, so he promises more.

When Florian sent his article around, he sent it to me also. Would you like to know why I didn't write anything about it then? Because I believed that even if every fact in it was true, it didn't prove anything legally about copyright infringement in connection with this lawsuit, and editorial judgment is very much part of being a journalist. Does that surprise you? Let me explain a little about US Copyright Law, then, and you'll see how to know when a claim of copyright infringement is just an initial working theory, or even FUD, and when it's actually a fact. That way you can avoid being snookered by premature claims if they ever happen again.

Because, my friends, I gather this is SCO II, the attacks on Android. It is Linux, after all, at its core, and the same M.O. that we endured in SCO I -- the bold claims to the media of copyright infringement, analysts popping up to "confirm" the claims, etc. -- are starting up again. Remember all the wild claims by SCO about how IBM was doomed, how they had a mountain of evidence already? Remember all the headlines? All the analysts supporting SCO's claims? Notice anything familiar now that everyone is going after Google? All right. SCO II. We'll have to go through it again. So let's get started.

Phandroid honorably explains what happened:

It seems the internet (we included) got a bit worked up over the sensationalist claims that Google is clearly stealing from Oracle and Java and was quick to spread the word without further checking into the code in question.
Ah, yes. Fact checking. An excellent plan.

But what if every word Mueller wrote was accurate? Let's analyze the situation that way. What if the files he wrote about were not supposed to be used, did ship, and were copied by Google? What would it prove? -- Nothing at all.

It's too early to know. It's fine to offer up what you think is evidence and let the group look at it and test it out. Nothing at all wrong with that. And if the tech turns out to be wrong, so be it. That's not a problem either. As you see, it'll happen anyhow, so we don't want to discourage people from trying to find useful information.

But overstating what things mean, extrapolating from what you find and implying things ... well, then you can expect to be called on it, and the best thing to do then is admit you were wrong and adjust. That's how it works. Have you ever read the Linux kernel mailing list? It's not for those with bruisable egos, for sure, because folks will tell you if your code stinks. It's no different with research. Why should it be? The point is to get to the truth.

The bottom line is this: none of us in the public would know at this point if any code was infringed or not, not you, not me, not Florian. Even if Oracle were feeding him inside information, he still wouldn't know, because Oracle doesn't know yet. Neither does Google. We are all in suspension until discovery is done and analysis of the code by the experts in the court case.

You see, proving infringement of computer code isn't the same as proving infringement of a literary work. Just because you registered the copyright on code and you see someone has copied your code, that doesn't establish that there is copyright infringement. It's a lot more complicated than that, as I will explain now.

It's possible, I suppose, that the parties to the Oracle v. Google litigation have privately had experts do the necessary analysis, but neither is talking publicly about it if they have. But even if they have experts who have analyzed the code already, it doesn't mean the other party's opposing experts will agree on a conclusion. They look at the same code, the same evidence, and they arrive at different conclusions. So the parties still don't know how this will all come out. That will have to be determined at trial. They call that dueling experts. Other than the filings in court, then, we have no idea at this early point what is what, and anyone claiming otherwise is simply wrong. Of course, finding truth is a lot easier if you aren't being paid by one of the parties or a best friend company, shall we say.

Engadget's correction article, Android source code, Java, and copyright infringement: what's going on?, says some things about copyright law I'd like to talk about a little bit:

So it's been a fun day of armchair code forensics and legal analysis on the web after Florian Mueller published a piece this morning alleging Google directly copied somewhere between 37 and 44 Java source files in Android. That's of course a major accusation, seeing as Oracle is currently suing Google for patent and copyright infringement related to Java, and it prompted some extremely harsh technical rebuttals, like this one from ZDNet and this one from Ars Technica. The objections in short: the files in question are test files, aren't important, probably don't ship with Android, and everyone is making a hullabaloo over nothing.

We'll just say this straight out: from a technical perspective, these objections are completely valid. The files in question do appear to be test files, some of them were removed, and there's simply no way of knowing if any of them ended up in a shipping Android handset. But -- and this is a big but -- that's just the technical story. From a legal perspective, it seems very likely that these files create increased copyright liability for Google, because the state of our current copyright law doesn't make exceptions for how source code trees work, or whether or not a script pasted in a different license, or whether these files made it into handsets. The single most relevant legal question is whether or not copying and distributing these files was authorized by Oracle, and the answer clearly appears to be "nope" -- even if Oracle licensed the code under the GPL. Why? Because somewhere along the line, Google took Oracle's code, replaced the GPL language with the incompatible Apache Open Source License, and distributed the code under that license publicly. That's all it takes -- if Google violated the GPL by changing the license, it also infringed Oracle's underlying copyright. It doesn't matter if a Google employee, a script, a robot, or Eric Schmidt's cat made the change -- once you've created or distributed an unauthorized copy, you're liable for infringement.

Heh heh. Not exactly. And as to it being fun, I doubt it's fun for Google to be called a copyright infringer in the media on such flimsy materials as these. Imagine if you are Google, you've done nothing legally wrong, and you wake up to headlines like these. I don't know yet myself if they've done nothing legally wrong, by the way, in that the analysis hasn't been done, so we don't know one way or the other. But the headlines said it was a fact that they were infringing and predicted doom for Google.

First, even in the US, it has not yet reached the point where if a third party violates someone's copyright without your awareness, you get blamed and get sued as a copyright infringer with all that implies. Unless SCO is going after you, of course. Remember SCO v. Autozone? There are those who use the law for competitive purposes, and it's a crying shame, because when it's really not about IP but all about crushing a competitor, all balance is lost. Not that SCO benefitted in any way from AutoZone that I know of. A normal procedure in a case of inadvertent copyright infringement as a result of a third party's action is to ask that the infringed materials be removed. End of story. A SCO type might take it further, but most do not. You could sue the third party, if you want, but usually you get the removal and go about your business.

One of the things Oracle will have to prove, then, is that it was Google that did something wrong. Yes, distribution or copying would count also. But were these files distributed? The tech analysis says no, so if Oracle wants to argue they were, they'll have to prove it. That hasn't happened yet. And by the way, Oracle would have to include Florian's files that aren't in the case currently, and they've already amended their complaint the one free time a party gets. To amend again, they'd have to get the court's permission or Google's assent. Remember when SCO wanted to amend a third time in SCO v. IBM, and they were refused? So, there is that unknown too. Here's Oracle's Amended Complaint. Here's the rule that says Oracle can't revise its complaint again unless the court or Google gives permission. It could still happen, but it hasn't yet. I'm just saying this is a lot of froth over something not even in the litigation to date.

Let's think a minute. What are test files for? Test files are for testing. Engadget says that we have no way of knowing if they were in the finished product. Well, there is a way to find out, actually, but if we are going to go by guesses, why would you distribute test files with a finished product? Think. Test files are for developers to ... you know... test stuff during the creation of the product, not for end users. So if test files were distributed, it could only be by mistake, I'd think. And even if they were distributed in a product, would end users know they were even there? Would they use that code ever for anything? If not, is there contributory infringement?

See what I mean? A SCO would so argue. David Boies is representing Oracle, so anything is possible, remembering the shameful AutoZone case, but courts don't view mistakes with the same serious and heavy hand as deliberate and willful acts. Why would they? You don't. Why would you imagine judges and juries have no capacity to think and act in a fine-tuned way?

Now, about the GPL. For sure, it doesn't work in the extreme fashion I'm reading people claim it does. If you violate the GPL, assuming it's v2, what happens is you can't distribute that code any *more*, and if *after* that violation you continue to distribute, only then is it a copyright violation. Until the violation, you had the GPL license, remember? If you quit the violation, you can then ask to get your GPL rights back and if the copyright holder says yes, that can be the end of it. That's usually how issues with the GPL are privately handled.

Keep in mind too that Sun released some Java code under the GPLv2 with a classpath exception.

I say assuming it's v2 because GPLv3 is compatible with the Apache License, and the procedure for dealing with violations is different. With GPLv3, you can cure a violation very simply. From the FAQ:

What does it mean to “cure” a violation of GPLv3?

To cure a violation means to adjust your practices to comply with the requirements of the license.

See how pleasant, compared to the heavy breathing proprietary methods? So the picture with GPL violations is nowhere near as bleak as some are assuming, even if they happened, so long as they are not deliberate and adjustments are made.

Oh, and if the license was changed, the issue is, again, who changed it? You can't assume it was Google. Paul Ryan says flat out it wasn't Google that put the files in the repository:

The infringing files are found in a compressed archive in a third-party component supplied by SONiVOX, a member of Google's Open Handset Alliance (OHA).
It's not established yet, I remind everyone, that the files are infringing. But Google, don't forget, lets folks take their Android software and use it to make products. So Oracle needs to find out who infringed, if there is alleged infringement. Otherwise, an operative could just upload some infringing materials on purpose to somebody's repository, then the company that hired him to do it could sue and win. Not even in the US is IP law that naive, unbalanced, and unfair.

Oh, and it's software Google gives away for free, so there are no profits from Android directly, which can be legally significant, even if Google did include infringing code, which is by no means proven yet.

Then there's fair use. Here's what the GPL FAQ says about that:

Do I have “fair use” rights in using the source code of a GPL-covered program?

Yes, you do. “Fair use” is use that is allowed without any special permission. Since you don't need the developers' permission for such use, you can do it regardless of what the developers said about it—in the license or elsewhere, whether that license be the GNU GPL or any other free software license.

Note, however, that there is no world-wide principle of fair use; what kinds of use are considered “fair” varies from country to country.

The US is a fair use country. And this is a US case. You have fair use here, and not only with GPL code.

Here's a second example of backtracking after printing Mueller's claims, from Business Insider:

Earlier today, we published a story based on a blog post from intellectual property lawyer Florian Mueller, who said he'd found evidence that somebody had taken a bunch of files from Java, changed the license on them, and put them into an Android source code repository using the new license....

Who's right? Engadget has a horse in the race because Burnette called them out by name for publishing the original story, and they point out that the geeks are technically correct. But from a legal perspective it may not matter. Oracle owns the code. Somebody put it into a Google code base and changed it in a way that Oracle objected to, without Oracle's permission. That may be all a judge needs to issue summary judgment FOR Oracle in the case.

This is why courts allow subject matter experts to testify in complicated cases. And why journalists use words like "apparently" and "alleged" when covering undecided legal disputes.

Of course it matters. The part about waiting for a court to rule is accurate. The part about waiting for the experts is fabulous advice. It would matter, though, if, for example, the test code never was put into products. Then Oracle couldn't claim contributory infringement, for example, as I pointed out, the way it could try to if it was, and it could affect damages analysis, etc. A judge is certainly going to want to analyze if any of it matters. Is the infringement what they call de minimis? It all matters.

And there was a very interesting comment by Jahava on the Slashdot article about Mueller's claims:

Licenses seem incorrect...

In Florian's paper, he points these out as Sun PROPRIETARY / CONFIDENTIAL. However, it looks like several of the sources come from Sun's mmademo, linked here. In this rendition of the document, each source file's license is a permissive one by Sun (i.e., not proprietary / confidential).

The ones from microedition seem to be mentioned elsewhere under GPL.

Some sources seem to come from here, where some of the files (e.g., have the proprietary markings, but these are interfaces. Control, for example, is an empty interface. Not sure if that affects anything.

I'm not qualified to make any sense out of this, but it seems like several of the sources Florian mentions are actually GPL'd sources with incorrect headers. There are a few trivial ones that (in the source I found) seem to be correctly marked proprietary. As much as I admire Florian's ability to grep, I think he's just found an error in some headers, not actual violations.

Would all that matter? You bet. If, for example, somebody wrote a script that ended up making mistakes, would they throw Google in jail, so to speak, and throw away the key just because the mistake showed up on its repository? It's silly. Do you want the law to be like that? How "Les Mis" of you.

The point that Jahava made that he knows he's not qualified to make sense out of all he found is the kind of modesty that all of us need to show when researching. I don't personally know who is right technically, but it doesn't prove ultimately what will happen in this case anyway even if Florian's findings were important. Not yet. Unless we are experts in analyzing computer code for copyrightability, which I'm not, and you're not, and Florian's not and Engadget's not, we can't possibly have an opinion that means anything much currently.

You mean just registering for copyright doesn't prove your code is copyrightable? That's correct. Remember both SCO and Novell registered copyrights on the same materials? So registering doesn't even prove you are the owner of the copyrights. A court had to make that determination.

If you read what Google has filed with the court in its Answer, you would know that Google "specifically denies that Google has infringed or is liable for any infringement of any valid and enforceable copyrights or copyright rights of Oracle." Why does it put it like that, as I've emphasized? Partly it's talking about Google not being the one who infringed, if there is infringement. Android is developed by lots of companies and individuals, and one defense is third-party liability. That's saying if there is a problem, it's not Google who infringed. Oracle would have to go after the infringer. "Other than the Harmony libraries, the Android platform – including, without limitation, the Dalvik VM – was independently developed by the OHA," Google points out. The OHA is the Open Handset Alliance, which is 78 companies and entities, not just Google. "The Android Open Source Project (“AOSP”) is tasked with the maintenance and further development of Android, including incorporating code and submissions from the community of developers who contribute to Android and the tens of thousands of developers who create applications for Android," Google wrote. That means it could be a lot of people and almost certainly isn't Google.

See now why just finding a copyrighted file in Android doesn't establish anything at all about Google yet and maybe never?

I heartily recommend reading all the filings, because it will keep you from believing claims that don't match real life.

Also -- and this is the biggest piece -- even if it was Google and even if there was copyright on the code allegedly misused, if you imagine that copyright infringement analysis is just about finding a file with a copyright notice on it and then seeing someone used or copied or distributed the code, please read this court ruling, Whelan v. Jaslow, a case about structure and similarity of code. It will give you an idea of why absolutely no one, in my opinion, can currently predict who is right or wrong regarding the Oracle copyright claims yet. If it were possible, I surely would have done it right here on Groklaw, wouldn't I? Note one footnote states that "independent creation of even identical works is therefore not a copyright infringement." I challenge you to read the entire ruling, if you can stay awake, or at least until you realize that copyright infringement is a complex analysis. And that's just one kind of copyright infringement being analyzed there.

Groklaw has a Legal Research page, and on it we have a collection of other copyright cases, and if you read them all, you'll have a pretty good picture, although not a complete one, of what a tall mountain Oracle must climb to prove infringement. Proving infringement of computer code isn't the same as proving infringement of, say, a book, where there isn't the same kind of filtration process. That filtration process is what Mueller's list lacks, and without it and by an expert too, it just doesn't mean anything beyond a place to begin digging by the experts, if the code were ever part of the litigation. That's why it is impossible to know if there is infringement and if there is, who infringed. That part Business Insider got right.

Later in discovery, Oracle will make its claims more specific, but even then the analysis is not finished. To give you an idea of how complex it all is, please read the Declaration of IBM's expert Dr. Brian Kernighan, which you can find here, from the SCO v. IBM litigation. It will show you that just because you put a copyright on a computer file, it doesn't follow that it's actually all copyrightable. He was responding to claims by SCO's then-employee Sandeep Gupta of alleged "substantial similarity" between certain "routines" and "groupings of code" in Linux and "copyrighted works allegedly owned by SCO", and in paragraphs 3 and 4 Kernighan write this:

3. In summary, I find fundamental errors in Mr. Gupta's conclusions. His conclusions of substantial similarity are flawed because he fails to exclude from comparison unprotectable elements of the allegedly copyrighted code, and he uses an indefensible standard for what qualifies as "substantially similar" code.

4. If unprotectable elements are excluded from the comparison and an appropriate standard of similarity is applied, there is no similarity between the parts of Linux identified by Mr. Gupta and the allegedly copyrighted works.

IBM, in responding to Gupta's claims mentioned in one court filing yet another issue in copyright infringement analysis when it's a computer code case:
In addition, as is also described by Dr. Kernighan, Mr. Gupta also fails to perform any analysis of whether the alleged similarities he identifies are "substantial". (See Kernighan Decl. ¶¶19, 26-27.) "Substantial similarity" may be found, according to the Tenth Circuit, only where "those protectable portions of the original work that have been copied constitute a substantial part of the original work--i.e., matter that is significant in the plaintiff's program". Id. at 839. Mr. Gupta does not, in his declaration, make any attempt to demonstrate that the code he identified (which in total consists of no more than a couple hundred lines of code (out of programs that are each millions of lines long) is significant.
So just finding one file or a few files doesn't necessarily mean much. This case isn't in the Tenth Circuit, by the way, but this is just to show you how complex it is to analyze copyright infringement of code.

Here's a paper that Mark Webbink wrote and allowed me to republish in 2003, where he gives an overview of US copyright law in relation to Open Source, and there is a section on how the Ninth Circuit analyzes what is or isn't copyrightable in the context of derivative works:

The Ninth Circuit's test is based on analytical dissection, which first considers whether there are substantial similarities in both the ideas and expressions of the two works and then utilizes analytic dissection to determine whether any similar features are protected by copyright. The similar elements are categorized by the degree of protection they are to be afforded. "Thin" protection is afforded to non-copyrightable facts or ideas that derive copyright protection only from the manner in which those facts or ideas are aligned and presented. "Broad" protection is afforded to copyrightable expression. The court uses these standards to make a subjective comparison of the works to determine whether, as a whole, they are sufficiently similar to justify a finding that one is a derivative work of the other.
Notice there can be degrees of protection, including whether a copyrighted work is actually copyrightable at all. Back in 2003, Dan Ravicher also wrote a paper [PDF] on how various circuits analyze derivative code. If we were working on the case, we'd have to check to see if it's changed, because law doesn't stand still. But my purpose now is just to show you an overview of how much is involved in deciding if there is infringement -- the analysis courts do *after* you find a copyrighted file copied or used for a derivative work. He explained in detail how the Abstraction, Filtration Comparison Test works that the Tenth Circuit and others use, and here's what he wrote the Ninth Circuit does, where Oracle's case is located:
Analytic Dissection Test

The Ninth Circuit has adopted the analytic dissection test to determine whether one program is a derivative work of another.6 The analytic dissection test first considers whether there are substantial similarities in both the ideas and expressions of the two works at issue. Once the similar features are identified, analytic dissection is used to determine whether any of those similar features are protected by copyright. This step is the same as the filtration step in the AFC test. After identifying the copyrightable similar features of the works, the court then decides whether those features are entitled to “broad” or “thin” protection. “Thin” protection is given to non-copyrightable facts or ideas that are combined in a way that affords copyright protection only from their alignment and presentation, while “broad” protection is given to copyrightable expression itself. Depending on the degree of protection afforded, the court then sets the appropriate standard for a subjective comparison of the works to determine whether, as a whole, they are sufficiently similar to support a finding that one is a derivative work of the other. “Thin” protection requires the second work be virtually identical in order to be held a derivative work of an original, while “broad” protection requires only a “substantial similarity.”

6 Apple Computer, Inc. v. Microsoft Corp., 35 F.3d 1435 (9th Cir. 1994).

If you read the Apple v. Microsoft ruling, it's a case where de minimis copying was considered, among other things:
In this, as in other cases, the steps we find helpful to follow are these:
(1) The plaintiff must identify the source(s) of the alleged similarity between his work and the defendant's work.

(2) Using analytic dissection, and, if necessary, expert testimony, the court must determine whether any of the allegedly similar features are protected by copyright. Where, as in this case, a license agreement is involved, the court must also determine which features the defendant was authorized to copy. Once the scope of the license is determined, unprotectable ideas must be separated from potentially protectable expression; to that expression, the court must then apply the relevant limiting doctrines in the context of the particular medium involved, through the eyes of the ordinary consumer of that product.

(3) Having dissected the alleged similarities and considered the range of possible expression, the court must define the scope of the plaintiff's copyright — that is, decide whether the work is entitled to "broad" or "thin" protection. Depending on the degree of protection, the court must set the appropriate standard for a subjective comparison of the works to determine whether, as a whole, they are sufficiently similar to support a finding of illicit copying....

It is not easy to distinguish expression from ideas, particularly in a new medium. However, it must be done, as the district court did in this case. Baker v. Selden, 101 U.S. 99, 25 L.Ed. 841 (1879). As we recognized long ago in the case of competing jeweled bee pins, similarities derived from the use of common ideas cannot be protected; otherwise, the first to come up with an idea will corner the market. Herbert Rosenthal Jewelry Corp. v. Kalpakian, 446 F.2d 738, 742 (9th Cir.1971). Apple cannot get patent-like protection for the idea of a graphical user interface....Well-recognized precepts guide the process of analytic dissection. First, when an idea and its expression are indistinguishable, or "merged," the expression will only be protected against nearly identical copying. Krofft, 562 F.2d at 1167-68; Kalpakian, 446 F.2d at 742. For example, in this case, the idea of an icon in a desktop metaphor representing a document stored in a computer program can only be expressed in so many ways. An iconic image shaped like a page is an obvious choice.

The doctrine of scenes a faire is closely related. As we explained in Frybarger v. International Business Machines Corp., 812 F.2d 525 (9th Cir.1987), when similar features in a videogame are "'as a practical matter indispensable, or at least standard, in the treatment of a given [idea],'" they are treated like ideas and are therefore not protected by copyright....

Apple suggests that scenes a faire should not limit the scope of its audiovisual copyright, or at least that the interactive character of GUIs and their functional purpose should not outweigh their artistry. While user participation may not negate copyrightability of an audiovisual work.... the district court did not deny protection to any aspect of Apple's works on this basis. In any event, unlike purely artistic works such as novels and plays, graphical user interfaces generated by computer programs are partly artistic and partly functional....

To the extent that GUIs are artistic, there is no dispute that creativity in user interfaces is constrained by the power and speed of the ... computer. See Manufacturers Technologies, Inc. v. Cams, Inc., 706 F.Supp. 984, 994-95 (D.Conn.1989) (denying protection to formatting style of plaintiff's screen displays because of constraints on viable options available to programmers). For example, hardware constraints limit the number of ways to depict visually the movement of a window on the screen; because many computers do not have enough power to show the entire contents of the window as it is being moved, the illusion of movement must be shown by using the outline of a window or some similar feature. Design alternatives are further limited by the GUI's purpose of making interaction between the user and the computer more "user-friendly." These, and similar environmental and ergonomic factors which limit the range of possible expression in GUIs, properly inform the scope of copyright protection.

Originality is another doctrine which limits the scope of protection. As the Supreme Court recently made clear, protection extends only to those components of a work that are original to the author, although original selection and arrangement of otherwise uncopyrightable components may be protectable. Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340, 348-51, 111 S.Ct. 1282, 1289-91, 113 L.Ed.2d 358 (1991).

There's lots more in there, if you want to read it all, but this should be enough to demonstrate that analyzing copyright infringement of computer code isn't identical to doing so regarding literary works.

Since Ravicher said the filtration step is the same in the Ninth as in the Tenth Circuit, here is how he explained that step is done:

Abstraction, Filtration Comparison Test

As mentioned above, the AFC test for determining whether a computer program is a derivative work of an earlier program was created by the Second Circuit2 and has since been adopted in the Fifth3, Tenth4 and Eleventh5 Circuits. Under the AFC test, a court first abstracts from the original program its constituent structural parts. Then, the court filters from those structural parts all unprotectable portions, including incorporated ideas, expression that is necessarily incidental to those ideas, and elements that are taken from the public domain. Finally, the court compares any and all remaining kernels of creative expression to the structure of the second program to determine whether the software programs at issue are substantially similar so as to warrant a finding that one is the derivative work of the other....


The most difficult and controversial part of the AFC test is the second step, which entails the filtration of protectable expression contained in the original program from any unprotectable elements nestled therein. In determining which elements of a program are unprotectabe, courts employ a myriad of rules and procedures to sift from a program all the portions that are not eligible for copyright protection.

First, as set forth in § 102(b) of the Copyright Act, any and all ideas embodied in program are to be denied copyright protection. However, implementing this rule is not as easy as it first appears. The courts readily recognize the intrinsic difficulty in distinguishing between ideas and expression and that, given the varying nature of computer programs, doing so will be done on an ad hoc basis. The first step of the AFC test, the abstraction, exists precisely to assist in this endeavor by helping the court separate out all the individual elements of the program so that they can be independently analyzed for their expressive nature.

A second rule applied by the courts in performing the filtration step of the AFC test is the doctrine of merger, which denies copyright protection to expression necessarily incidental to the idea being expressed. The reasoning behind this doctrine is that when there is only one way to express an idea, the idea and the expression merge, meaning that the expression cannot receive copyright protection due to the bar on copyright protection extending to ideas. In applying this doctrine, a court will ask whether the program's use of particular code or structure is necessary for the efficient implementation of a certain function or process. If so, then that particular code or structure is not protected by copyright and, as a result, it is filtered away from the remaining protectable expression.

A third rule applied by the courts in performing the filtration step of the AFC test is the doctrine of scenes a faire, which denies copyright protection to elements of a computer program that are dictated by external factors. Such external factors can include: (a) the mechanical specifications of the computer on which a particular program is intended to operate; (b) compatibility requirements of other programs with which a program is designed to operate in conjunction; (c) computer manufacturers' design standards; (d) demands of the industry being serviced; and (e) widely accepted programming practices within the computer industry. Any code or structure of a program that was shaped predominantly in response to these factors is filtered out and not protected by copyright.

Lastly, elements of a computer program are also to be filtered out if they were taken from the public domain or fail to have sufficient originality to merit copyright protection.

Portions of the source or object code of a computer program are rarely filtered out as unprotectable elements. However, some distinct parts of source and object code have been found unprotectable. For example, constants, the invariable integers comprising part of formulas used to perform calculations in a program, are unprotectable. Further, although common errors found in two programs can provide strong evidence of copying, they are not afforded any copyright protection over and above the protection given to the expression containing them.

You see how complicated it is? Did what Florian wrote about include any analysis of this kind? Well, he's not qualified to do it, even if he'd known it needs to be done, but we can at least conclude that it's way too early to know if there has been any copyright infringement.

Anyway, my point is that you can register a copyright and allege infringement, but that isn't all there is to it. SCO did exactly that, but it was pointed out by IBM's experts that the analysis must include things like scenes a faire, fair use, originality, whether the code is dictated by programming practices or governed by standards, or is in the public domain, etc. Being in the public domain in this sense doesn't mean whether the author put it there. It means is there sufficient originality to justify copyright protection? If the answer is no, then the court won't find infringement, even if the author has registered a copyright and someone used the code.

Here's another IBM expert who explains that, Dr. Randall Davis, who details the process he followed in examining SCO's claims of infringement:

13. I understand the accepted process for determining substantial similarity to call for abstraction, filtration, and comparison, although when modest amounts of code are involved, the abstraction step may not be required. I undertand filtration to involve the removal of at least the following elements: ideas, purposes, functions, procedures, processes, systems, methods of operation, facts, unoriginal elements (e.g., those in the public domain), expression that is inseparable from or merged with ideas or processes, and expressions that are standard, stock, or common to a particular topic, or that necessarily follow from a common theme or setting.

14. I understand further that with respect to computer programs in particular, the scenes a faire doctrine:

excludes from protection those elements of a program that have been dictated by external factors. In the area of computer programs these external factors may include: hardware standards and mechanical specifications, software standards and compatibility requirements, computer manufacturer design standards, target industry practices and demands, and computer industry programming practices.
Gates Rubber v. Bando, all citations omitted...

16. Despite an extensive review, I could find no source code in any of the IBM Code that incorporates any portion of the source code contained in the Unix System V Code or is in any other manner similar to such source code. Accordingly, the IBM Code cannot be said, in my opinion, to be a modification or a derivative work based on Unix System V Code.

And so the mountain went poof. In the end, it turned out SCO didn't own the copyrights anyway, but even if it did, there was no there there, according to this world-renowned expert. Here's the Gates Rubber ruling, if you would like to read it in full.

It's complex analyzing copyright infringement claims, in other words, and no one with the necessary expertise at this point has done that analysis. No one without that expertise has done it either, but that's because they don't know it needs to be done. They find a file, see a copyright notice, and consider it "proof". It's not. I don't believe Florian is a lawyer [see his bio], but it wouldn't matter even if he was. What's missing is the analysis.

If it turns out that Microsoft is behind SCO II, then no doubt such analysts will pop up and make claims and provide some analysis. But that won't prove anything either. What matters is what happens in the court.

As discovery proceeds, at some point Oracle will tell us version, file and line information on all its claims, and what precisely it thinks was done wrong, and then the analysis can begin by the hired experts. Until that happens, no one at all can know whether anyone violated anybody's copyright as the law views it. That can be quite different from how a party views it, as SCO found out. After all their bold claims to the media, when it was time to get specific in court, the mountain of evidence simply evaporated. That can happen in the Oracle v. Google case too, and we'll just have to wait to find out the outcome down the road. Meanwhile, if I'm right that this is SCO II, there will be a lot more provided to journalists. I think then it's wise to point out that SCO lost, despite all the headlines they generated in the beginning. Lawsuits don't really lend themselves to headlines. It's not simplistic that way. So, my advice is to be aware that there are agendas in this picture.

I guess an enterprising journalist could usefully dig for or at least factor in information and evidence about that.

Update: A reader noticed something interesting:

I was looking at the 7 files in the Android tree that Dan Bornstein deleted (,,,,,, and noticed they all had MS-DOS line endings, i.e. Carriage Returns ('r', 0x0d) before the Newline, whereas most other files in the Android tree (to be honest, I didn't look at all of them) seem to just have UNIX line endings (just the Newline).

  View Printable Version

Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )