decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books

Gear

Groklaw Gear

Click here to send an email to the editor of this weblog.


You won't find me on Facebook


Donate

Donate Paypal


No Legal Advice

The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers.

Here's Groklaw's comments policy.


What's New

STORIES
No new stories

COMMENTS last 48 hrs
No new comments


Sponsors

Hosting:
hosted by ibiblio

On servers donated to ibiblio by AMD.

Webmaster
Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
Friday, September 17 2004 @ 08:33 AM EDT

Here is Dr. Randall Davis of MIT's second Declaration, PDF and text. If you had any lingering doubt as to whether SCO might have found some infringing code, I think this will dispel it. Dr. Davis looked at all the code Sandeep Gupta listed as allegedly infringing, and this world-famous expert concludes thus:

"Despite an extensive review, I could find no source code in any of the IBM Code that incorporates any portion of the source code contained in the Unix System V Code or is in any other manner similar to such source code. Accordingly, the IBM Code cannot be said, in my opinion, to be a modification or a derivative work based on Unix System V Code."

This is a paper document, scanned in. It was attached in a binder, so the scanner had quite a time of it. I hope you can make it out. I left out the table, because I am still coding it, and I will add it in when I am done, but when I read this declaration, I found it so thrilling, I wanted to share it immediately with you. I note the date of this document is August, I think the 13th, though I can't make out the date clearly. That's approximately how long SCO has known that they had nothing, according to Dr. Randall Davis. SCO's 3rd quarter teleconference was August 31.

Frank Sorenson did the HTML on the tables -- thank you, Frank, and I note the following from the conclusion of Table I:

  • The 8 AIX files are listed in SCO's Revised Supplemental Responses to IBM's First and Second Set of Interrogatories, dated 12 January 2004; SCO identified a total of 468 lines.
  • The 10 Dynix files are listed in SCO's Revised Supplemental Response and Exhibit D of the letter from B. Hatch to T. Shaughnessy of 19 April 2004; SCO identified a total of 2,162 lines.
  • The 17 Linux 2.6.5 files are listed in Exhibit C of the letter of 19 April 2004; SCO identified 2,437 lines. SCO's letter identifies lines 794-726 of mm/page_alloc.c, which appears to be a typographical error.
  • The 63 JFS files are listed (with some repetition) in Tables H and I of SCO's Revised Supplemental Response, and Exhibit B of the letter of 19 April 2004; SCO identified 21,692 lines.

  • Grand total: 26,759 lines identified by SCO in 98 files.

    To which, we can now add, Grand total of lines and files that infringe, according to this declaration: zero.

    ******************************

    SNELL & WILMER, L.L.P.
    Alan L. Sullivan, Esq.
    Todd M. Shaughnessy, Esq.
    [address, phone, fax]

    CRAVATH, SWAINE & MOORE LLP
    Evan R. Chesler, Esq.
    David R. Marriott (7572)
    [address, phone, fax]

    Donald J. Rosenberg, Esq.
    [address]

    Attorneys for Defendant/Counterclaim-Plaintiff
    International Business Machines Corporation



    IN THE UNITED STATES DISTRICT COURT
    FOR THE DISTRICT OF UTAH

    THE SCO GROUP, INC.

         Plaintiff/Counterclaim-Defendant,

    -against-

    INTERNATIONAL BUSINESS
    MACHINES CORPORATION,

         Defendant/Counterclaim-Plaintiff
    DECLARATION OF
    RANDALL DAVIS


    Civil No. 2:03CV-0294 DAK

    Honorable Dale A. Kimball

    Magistrate Judge Brooke C. Wells

    I. INTRODUCTION

    1. My name is Randall Davis. I am a Professor of Computer Science at the Massachusetts Institute of Technology. Exhibit I contains a resume providing details of my technical background and experience. I received my undergraduate degree from Dartmouth, graduating summa cum laude, Phi Beta Kappa in 1970, and a Ph.D. from Stanford in artificial intelligence in 1976. I came to MIT in 1978, served for five years as Associate Director of the MIT Artificial Intelligence Laboratory, and currently serve as a Research Director in the newly formed MIT Computer Science and Artificial Intelligence Laboratory.

    2. I have published some 50 articles on issues related to artificial intelligence and have served on several editorial boards, including Artificial Intelligence, AI in Engineering, and the MIT Press series in AI. I am a co-author of Knowledge-Based Systems in AI.

    3. In recognition of my research in artificial intelligence, I was selected in 1984 as one of America's top 100 scientists under the age of 40 by Science Digest. In 1986 I received the AI Award from the Boston Computer Society for contributions to the field. In 1990 I was named a Founding Fellow of the American Association for AI and in 1995 was elected to a two-year term as President of the Association. From 1995-1998 I served on the Scientific Advisory Board of the U.S. Air Force.

    4. In addition to my work with artificial intelligence, I have also been active in the area of intellectual property and software. Among other things, I have served as a member of the Advisory Board to the US Congressional Office of Technology Assessment study on software and intellectual property, published in 1992 as Finding a Balance: Computer Software, Intellectual Property, and the Challenge of Technological Change. I have published a number of articles on the topic, including co-authoring an article in the Columbia Law Review in 1994 entitled "A Manifesto Concerning Legal Protection of Computer Programs" and an article in the Software Law Journal in 1992 entitled "The Nature of Software and its Consequences for Establishing and Evaluating Similarity."

    5. In 1990 I served as expert to the Court (Eastern District of NY) in Computer Associates v. Altai, a software copyright infringement case that articulated the abstraction, filtration, comparison test for software. I have also been retained by the Department of Justice on its investigation of the INSLAW matter. In 1992 (and later in 1995) my task in that engagement was to investigate alleged copyright theft and subsequent cover-up by the Federal Bureau of Investigation, the National Security Agency, the Drug Enforcement Agency, the United States Customs Service, and the Defense Intelligence Agency.

    6. From 1998-2001 I served as the chairman of the National Academy of Sciences study on intellectual property rights and the emerging information infrastructure entitled The Digital Dilemma: Intellectual Property in the Information Age, published by the National Academy Press in February, 2000.

    7. I have been retained as an expert in over thirty cases dealing with alleged misappropriation of intellectual property, such as the allegations raised in this case, and have done numerous comparisons of code. I have been retained by plaintiffs who have asked me to investigate violations of intellectual property, by defendants who have asked me to investigate allegations made against them, and by both sides to serve as the sole arbiter of a binding arbitration. A list of cases in which I have been involved is attached as Exhibit II.

    8. I have been retained by counsel for IBM in this lawsuit and am being compensated at a rate of $550 per hour.

    II. THE TASK

    9. I have been asked to examine the question of whether the lines of source code in the 98 files in Table I (the "IBM Code") are modifications of, or derivative works based on, any source code in any of the 21 versions of Unix System V listed in Table II (the "Unix System V Code").

    10. I have been instructed by counsel that one work is a "derivative work" of another under federal copyright law if it incorporates in some form a portion of the preexisting work and is substantially similar to the preexisting work. In my understanding, and as I use the term in my analysis, a "modification" based on a preexisting work must also incorporate in some form a portion of the preexisting work, else there would be no basis for calling it a modification.

    11. In performing my analysis, I have therefore undertaken to determine whether the IBM Code incorporates any portion of source code contained in the Unix System V Code or is any other manner similar to such Unix System V Code.

    Table I: Files and Lines of Code Identified by SCO

    AIX 9922A_43NIA Files 
    File NameLines Identified By SCO
    kernel/sys/IA64/bootrecord.h64-170
    kernel/sys/hd_psn.h32
    usr/include/jfs/inode.h16-37, 39-40, 62-66, 72-76, 83-158, 161-66, 172-80, 199-205
    usr/include/liblvm.h234-250, 252-72, 289-307, 316-63
    usr/include/lvm.h26-35
    usr/include/lvmrec.h24-92
    kernel/sys/vnode.h109-33
    kernel/sys/vgsa.h37, 56-73
      
    Dynix 4.6.1 Files 
    File NameLines Identified By SCO
    kernel/os/kern_clock.c2028-59
    kernel/os/kma_defer.c191-353, 370-427, 550-582, 603-703
    kernel/sys/kma_defer.h46-52, 95-119, 129-32, 140
    kernel/i386/locore.s1487-97
    kernel/i386/plocal.h1517-37
    kernel/os/rclock.c303-17, 383-613, 616-1825
    kernel/sys/rclock.h175-228, 238-41, 243-423
    kernel/i386/startup.c2054
    kernel/i386/trap.c1554-63
    kernel/os/vfs_dio.cNo lines identified
      
    JFS Files 
    File NameLines Identified By SCO
    include/linux/jfs/ref/jfs_aixisms.h26-27, 32, 62, 193, 227, 248
    include/linux/jfs/ref/jfs_dirent.h55
    include/linux/jfs/ref/jfs_inode.h76-77, 81, 95, 97, 192-233, 343-425
    include/linux/jfs/ref/jfs_os2.h33-34
    include/linux/jfs/ref/jfs_dasdlim.hNo lines identified
    include/linux/jfs/ref/jfs_dinode.h35-49, 53-200
    include/linux/jfs/ref/jfs_lock.h72-119, 338-391, 395-406
    include/linux/jfs/ref/jfs_superblock.h19-105
    include/linux/jfs/ref/jfs_btree.h19-113, 115-143
    include/linux/jfs/ref/jfs_bufmgr.h30-33, 37-49, 123-141, 274-279
    include/linux/jfs/ref/jfs_cachemgr.h71-108, 371-388
    include/linux/jfs/ref/jfs_chkdsk.hNo lines identified
    include/linux/jfs/ref/jfs_clrbblks.h24-48, 52-60
    include/linux/jfs/ref/jfs_debug.h28-30, 81-93, 96-106, 117-134, 137-142, 146-168
    include/linux/jfs/ref/jfs_defragfs.h20-56
    include/linux/jfs/ref/jfs_dmap.h22-272, 276-324
    include/linux/jfs/ref/jfs_dtree.h25-79, 88-210, 233-287, 312-323
    include/linux/jfs/ref/jfs_extendfs.h19-29, 32-39
    include/linux/jfs/ref/jfs_filsys.h76-103, 167-172, 230-256, 266-277, 279-321
    include/linux/jfs/ref/jfs_imap.h19-169
    include/linux/jfs/ref/jfs_io.hNo lines identified
    include/linux/jfs/ref/jfs_logmgr.h34-506, 540-577
    include/linux/jfs/ref/jfs_proto.h58-62, 117-128
    include/linux/jfs/ref/jfs_txnmgr.h25-251, 255-345
    include/linux/jfs/ref/jfs_types.h100-223, 299-582
    include/linux/jfs/ref/jfs_util.h38-62
    include/linux/jfs/ref/jfs_xtree24-131, 139-212
    fs/jfs/ref/jfs_dio.c333
    fs/jfs/ref/jfs_logmgr.c27-67, 113-132, 165-781, 1052-1607, 1623-3211
    fs/jfs/ref/jfs_bufmgr.c289-311, 364-441, 557-649, 682-917, 1270-1468, 1691-2016, 2102-2194
    fs/jfs/ref/jfs_cachemgr.cNo lines identified
    fs/jfs/ref/jfs_dnlc.c55-89, 140-200, 212-224, 251-322, 325-338, 402-451, 485-573, 685-713
    fs/jfs/ref/jfs_dtree.cNo lines identified
    fs/jfs/ref/jfs_ifs.cNo lines identified
    fs/jfs/ref/jfs_initl.cNo lines identified
    fs/jfs/ref/jfs_inode.c312-350, 390-463, 483-510
    fs/jfs/ref/jfs_link.c33-152
    fs/jfs/ref/jfs_mknod.cNo lines identified
    fs/jfs/ref/jfs_readdir.c38-113
    fs/jfs/ref/jfs_readlink.c26-110
    fs/jfs/ref/jfs_statfs.c23-139
    fs/jfs/ref/jfs_symlink.c23-204
    fs/jfs/ref/jfs_txnmgr.c26-89, 122-132, 155-351, 380-414, 463-482, 531-661, 677-682, 710-767, 806-1153, 1162-1182, 1194-1246, 1293-1298, 1318-1539, 1577-1761, 1796-1856, 1883-1910, 1922-2097, 2115-2151, 2219-2321, 2350-2674, 2822-2845, 2983-3003
    fs/jfs/ref/selector.cNo lines identified
    fs/jfs/ref/jfs_create.c41-121, 127-135, 153-169, 193-223, 233-239, 241-264
    fs/jfs/ref/jfs_defragfs.c33-75, 84-89, 108-111, 119-264
    fs/jfs/ref/jfs_dmap.c43-4475
    fs/jfs/ref/jfs_extendfs.c43-153, 185-249, 293-579
    fs/jfs/ref/jfs_fsync.c32-84
    fs/jfs/ref/jfs_ftruncate.c37-129, 143, 156-170, 230-341
    fs/jfs/ref/jfs_getattr.c33-124
    fs/jfs/ref/jfs_hold.c33-63
    fs/jfs/ref/jfs_imap.c27-665, 680-2855, 2876-2893, 2904-2990
    fs/jfs/ref/jfs_lookup.c37-179
    fs/jfs/ref/jfs_mkdir.c37-111, 130-213, 222-264, 322-345
    fs/jfs/ref/jfs_mount.c31-188, 198-215, 229-785
    fs/jfs/ref/jfs_open.c37-98, 117-126, 218-277, 292-312
    fs/jfs/ref/jfs_rele.c31-64
    fs/jfs/ref/jfs_remove.c36-145, 157-464
    fs/jfs/ref/jfs_rename.c36-222, 246-313, 390-526, 577-651, 760-791
    fs/jfs/ref/jfs_rmdir.c36-125, 137-156, 188-193
    fs/jfs/ref/jfs_umount.c45-182, 198-307, 318-322
    fs/jfs/ref/jfs_util.c49-120, 133-163, 175-230, 300-425
      
      
    Linux 2.6.5 Files 
    File NameLines Identified By SCO
    arch/i386/kernel/srat.c1-450
    arch/i386/kernel/numaq.c1-112
    arch/i386/mach-es7000/topology.c35-49
    arch/i386/mach-default/topology.c35-49
    arch/i386/mm/discontig.c1-434
    arch/i386/pci/numa.c1-129
    arch/ppc64/kernel/smp.c733-754, 783
    arch/ppc64/mm/numa.c1-374
    include/asm-i386/topology.h1-85
    include/asm-i386/mmzone.h1-154
    include/asm-i386/numaq.h1-164
    include/asm-ppc64/mmzone.h1-95
    include/asm-ppc64/topology.h1-49
    include/linux/mmzone.h350-62
    include/linux/numa.h1-16
    kernel/sched.c44, 212-13, 239-72, 1002-1126, 1390-1401, 1407, 1421-22, 1432-33
    mm/page_alloc.c[724]-726, 737-738, 827-35, 889-92, 983-92, 1137-1238

    The 8 AIX files are listed in SCO's Revised Supplemental Responses to IBM's First and Second Set of Interrogatories, dated 12 January 2004; SCO identified a total of 468 lines.

    The 10 Dynix files are listed in SCO's Revised Supplemental Response and Exhibit D of the letter from B. Hatch to T. Shaughnessy of 19 April 2004; SCO identified a total of 2,162 lines.

    The 17 Linux 2.6.5 files are listed in Exhibit C of the letter of 19 April 2004; SCO identified 2,437 lines. SCO's letter identifies lines 794-726 of mm/page_alloc.c, which appears to be a typographical error.

    The 63 JFS files are listed (with some repetition) in Tables H and I of SCO's Revised Supplemental Response, and Exhibit B of the letter of 19 April 2004; SCO identified 21,692 lines.

    Grand total: 26,759 lines identified by SCO in 98 files.

    VERSION OF UNIX SYSTEM VNUMBER OF FILESTOTAL LINES OF SOURCE MATERIAL
    System V version 1.01,400347,099
    System V version 1.11,253208,086
    System V version 2.04,372896,148
    System V version 2.0_3B203,256577,484
    System V version 2.2.0_3B154,530985,196
    System V version 2.1.0V1_VAX2,401477,251
    System V version 2.1_31,280360,281
    System V 3.04,781818,403
    System V 3.13,849631,382
    System V 3.24,369702,328
    System V 3.2 for 3864,810991,212
    System V 4.0 for 3869,4721,853,434
    System V 4.0v2 for 38611,7712,367,995
    System V 4.0v3 for 3869,4661,957,328
    System V 4.0 MP12,6492,876,245
    System V 4.121,7983,567,560
    System V 4.1 ES11,9022,595,549
    System V 4.2 ES-MP21,5775,148,564
    UnixWare 1.128,8696,493,708
    UnixWare 2.144,34010,182,665
    UnixWare 7.1.370,39723,759,651
       
    TOTALS278,54267,797,569

    12. The conclusions set out here are not intended as, and do not represent, legal conclusions. My conclusions are instead based upon my understanding of the law with respect to the appropriate process and procedures for making a judgment of substantial similarity.

    13. I understand the accepted process for determining substantial similarity to call for abstraction, filtration, and comparison, although when modest amounts of code are involved, the abstraction step may not be required. I undertand filtration to involve the removal of at least the following elements: ideas, purposes, functions, procedures, processes, systems, methods of operation, facts, unoriginal elements (e.g., those in the public domain), expression that is inseparable from or merged with ideas or processes, and expressions that are standard, stock, or common to a particular topic, or that necessarily follow from a common theme or setting.

    14. I understand further that with respect to computer programs in particular, the scenes a faire doctrine:

    excludes from protection those elements of a program that have been dictated by external factors. In the area of computer programs these external factors may include: hardware standards and mechanical specifications, software standards and compatibility requirements, computer manufacturer design standards, target industry practices and demands, and computer industry programming practices.

    Gates Rubber v. Bando, all citations omitted

    15. The opinions I report here are based on the documents I have reviewed (a list is given in Exhibit III), and on my knowledge, background, and experience in the field of computer science. I am continuing work on this and reserve the right to augment my findings as additional information becomes available to me.

    III. SUMMARY OF FINDINGS

    16. Despite an extensive review, I could find no source code in any of the IBM Code that incorporates any portion of the source code contained in the Unix System V Code or is in any other manner similar to such source code. Accordingly, the IBM Code cannot be said, in my opinion, to be a modification or a derivative work based on Unix System V Code.

    17. As explained in detail below, I used two programs, called COMPARATOR and SIM, to help automate the process. COMPARATOR looks for lines of text that are literally or nearly literally identical, while SIM looks for code that is syntactically the same.

    18. I used both programs to compare all 26, 759 lines of the IBM Code identified by SCO against all 67,797,569 lines in the Unix System V Code.

    19. I believe that the comparisons I performed using these tools are conservative and hence resulted in more potential matches than might otherwise be found using a less conservative approach.

    20. These comparisons required on the order of 10 hours of computation time on a dual 3 GHz Xeon processor system with 2 GB of RAM. This is a high-end workstation routinely and easily available off the shelf from commercial vendors such as Dell.

    21. COMPARATOR reported 15 potential hits. I reviewed each of these potential hits in detail and determined them not to be true matches of copied code, but rather coincidental matches of common terms in the C programming language. (Paragraphs 30 below discuss conincidental matches in COMPARATOR.

    22. SIM did not report any potential hits.

    IV. METHODOLOGY

    23. I was asked to analyze the specific AIX and Dynix files and lines of code cited by SCO in their filings (and listed in Table I). In instances where SCO failed to identify any specific AIX and Dynix code upon which code in Linux is allegedly based, I was asked to analyze the Linux files and lines of code cited by SCO (and listed in Table I). Finally , I was asked to analyze the JFS files and lines of code cited by SCO (and listed in Table I), even though SCO did not identify any corresponding AIX, Dynix, or Linux code for such files. All of this IBM Code in Table I was compared to all of the Unix System Code in Table II to determine if the IBM Code contains any portion of the Unix System V Code or is in any other manner similar to any portion of the Unix System V Code. 1

    24. For purposes of my review, I did not first apply the "abstraction" and filtration" analyses to the Unix System V Code. Instead, to be conservative, I assumed that all of the Unix System V code was in fact protectable (although I do not believe all of such code in fact to be protectable) and proceeded to compare all of the Unix System V Code with all of the IBM Code to see if there were any true matches of copied code in the first place. To the extent necessary, I then applied the "filtration" analysis to the reportedly matching code to determine if such code was in fact protectable.

    25. In doing my analysis I used two programs, employing two different algorithms, to detect material in the IBM Code that might contain, or be similar to, material in the Unix System V Code. The first, called COMPARATOR [1], is designed to find sequences of lines in two different files that are literally, or nearly-literally the same. The second program, SIM [2], is designed to detect non-literal similarities at the level of syntactic structure.

    26. Both programs take two lists of files and compare every line in the first set of files against every line in the second set, and report every match they find. Each match consists of a file name and line numbers indicating places in each file that the program believes to be similar.

    27. The first step of my methodology was to compare all the IBM Code against all the Unix System V Code. At my direction, one of my assistants ran the IBM Code and the Unix System V Code through the COMPARATOR and SIM programs to generate a set of initial matches.

    28. Next, I manually reviewed all of the matches reported by the comparison tools. All of the matches that I reviewed were not true matches of copied code. As a result, I did not have to perform any "filtration" analysis on the code.

    29. The matches reported by COMPARATOR between the IBM Code and the Unix System V Code consisted of coincidental matches of terminology in the C programming language, and thus not true matches. These coincidental matches arise in much the same way that, if we compared the entire text of two novels (e.g., War and Peace and A Tale of Two Cities), we would surely find that they both contain the phrase "and then they" somewhere within them. Such coincidences of common language are no more indicative of copying in English than the corresponding matches of programming text are in the large bodies of code examined here.

    30. The box below shows one of the reported matches from the lines of code cited by SCO. COMPARATOR reported a match between lines 588-591 in rclock.c and lines 1665-1667 from System V UW1.1 /src/i386at/uts/io/target/sdi.c:

    Lines 588-591 from rclock.c
    #endif /* RCLOCK_PROF */

    return;
    }

    Lines 1665-1667 from sdi.c
    #endif

    return;
    }

    The two "words" -- endif and return -- that appear in the two files are so common in code written in the C language that finding them together like this is purely an accident, of no significance in detecting copying. In particular, the code from each file above simply signifies the ending of a routine; it is as if we had found two bodies of unrelated English text that each happened to conclude with the words "the end".

    31. Note that there are 4 lines cited from the IBM file but only three from the Unix file. This is because COMPARATOR ignores blank lines (the second line in the IBM code excerpt is blank), which keeps it from being misled by this sort of immaterial variation. COMPARATOR also ignores single line comments (i.e., a line of text that start with "/*"), hence its finding that the first line of each of these excerpts is similar. 2 This is another way in which it is not misled by immaterial variation. These are two of the factors why COMPARATOR is described above by saying that it "looks for lines of text that are literally or nearly literally identical.

    32. All of the potential hits reported by COMPARATOR were of the type discussed in paragraphs 29 and 30; i.e., they consisted entirely of coincidental matches of common terms in the C programming language. Even two programs known to have no code copied from one to the other will show these sorts of coincidental matches. Given the volume of code in question here (e.g., 68,000,000 lines of Unix code), the presence of these type of matches is both to be expected, and evidence that the tool was in fact performing successfully in finding potential matches.

    33. In this instance, then, I did not need to perform a "filtration" analysis with respect to these matches, because they were not true matches of code at all. In any case, these matches would not be protectable under the filtration analysis. At best, they could be thought of as cliches or stock phrases, the sorts of things that are routinely "said" in source code by any author, and that cannot therefore be considered significant when looking for copying.

    34. The SIM program did not report any matches between the IBM Code and the Unix System V code. As I result, I did not have to manually review any such code for false positives.

    35. The remainder of this section describes the algorithms used by the comparison programs and the local modifications that were made to enhance the program.

    IV.1. COMPARATOR

    36. The COMPARATOR program considers each file 3 lines at a time, and identifies all files that share the same 3 (or more) lines of code.

    37. COMPARATOR "normalizes" its input, so that differences resulting from comments, case, and white space are ignored. This prevents immaterial changes that may arise from code copying from fooling the program. Then, all input is "shredded" into overlapping 3 line segments and identical segments from different files are gathered together. 3 Adjacent identical sections (e.g., lines 3-5 and lines 4-6) are then combined into a single section (e.g., lines 3-6).

    IV.2. SIM

    38. The SIM program works by breaking source files into tokens (i.e., such as language keyword, punctuation, variables, constants, and the like) and comparing sequences of tokens for commonality. This conversion of source code into tokens allows the program to focus solely on the structure of the code.

    39. For example, a statement like

    if (a > b) return a; else return b;
    is structurally the same as

    if (c > b) return c; else return b;

    40. Both statements have the same syntactic structure, namely:

    If (Var > Var) Return Var; Else Return Var;

    which SIM would identify as a match. 4

    IV.3. Modifications to the Programs

    41. Slight modifications were made to both of these programs to make them faster and more efficient, so that they could handle the large amount of source code under consideration in this case.

    42. As publicly distributed, COMPARATOR and its associated scripts have several major performance bottlenecks, which were identified and removed by my assistant. These fixes improved the speed at which the program operated; they did not alter the methodology used by the program to find matches.

    43. SIM was modified by my assistant to reduce the number of false matches it produced. It was determined that many matches reported by SIM arise because the program treats all numbers, strings and variable identifiers identically. For example, to SIM, a list of integers such as 1, 2, 3, 4 looks just the same as a list of very different numbers, such as 73234, 1592, 7182, 31415, because syntactically they are both simply a list of four numbers. This occurs in the current context because operating systems code commonly includes long arrays of numbers that encode instructions for hardware. This also arises in structure initializations where there may be long sequences of identifiers. Arrays of character strings are also common as means of associating strings with certain numeric values (e.g., error codes and messages).

    44. These false matches in SIM were avoided by first making tokenizing stricter -- strings and numbers are considered to be the same only if they have the same value. 5 Next, a step within SIM itself removes matches that consist of a sequence where over 70% of the tokens are commas, identifiers, numbers, strings and tokens that are part of C's "switch" statements.

    IV.4. Alternative Tools

    45. Most other tools available to assist in organizing code for expert inspection operate in a similar manner. Tools like fplag [3] and MOSS [4] operate similarly to SIM, tokenizing the input stream in order to compare code structure, but differing in the way they optimize the algorithms for performance. MOSS in particular uses a statistical sampling technique which results in a very small probability that a duplication may be missed.

    46. The combination of line matching and syntactic analysis used in this comparison is similar to the technique used by CodeMatch [5], a commercial program for detecting code copying. CodeMatch uses the same algorithms as COMPARATOR and SIM and adds three smaller tests: comparing the number of identical words in two files, comparing the number of words in one file that appear as sub-words in another file, and checking comment lines.

    47. SIM and COMPARATOR were chosen both because they provide the capabilities needed, and because they offered full access to their source code, making it possible to understand exactly how they worked and to customize them to the needs of this case. The comparisons I performed using SIM and COMPARATOR were intended to be as conservative as possible and to produce the most potential matches for me to review individually.

    V. SUMMARY

    48. After a detailed review that exhaustively compared almost 27,000 lines of IBM Code against almost 68,000,000 lines of Unix System V Code, I could find no evidence that any of the IBM Code incorporates a portion of, or is similar to, any of the Unix System V Code.

    49. I therefore conclude that the IBM Code is not a modification or a derivative work based on the Unix System V Code.

    50. I declare under penalty of perjury that the foregoing is true and correct.

    ____[signature]____
    Randall Davis

    13 August 2004

    VI. REFERENCES

    1. Raymond, Erik, COMPARATOR, http://www.catb.org/~esr/comparator

    3. Grune, Dick, The software and text similarity tester SIM, http:..www.cs.vu.nl/~dick/sim.html, Version 2.12.

    3. Lutz Prechelt, Guido Malpohl, Michael Philippsen, Finding Plagiarisms among a Set of Programs with Jplag, Journal of Universal Computer Science, 2002, 8:11, pp. 1016-1038.

    4. Saul Schleimer, Daniel Wilkerson, Alex Aiken, Winnowing: Local Algorithms for Document Fingerprinting, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, 2003, pp. 76-85.

    5. Zeidman, Bob, Code/Match detects plagiarism.


    1 In addition to the analysis reviewed herein, I also manually reviewed the following Linux code cited in the 7 July 2004 Declaration of Sandeep Gupta: ipc/util.c (lines 119-52) and kernel/futex.c (159, 178, 187, 188-91, 456, 489,495, 298-300, 302-08). This review could be carried out manually because Mr. Gupta had specified specific lines that were alleged to be similar. There was thus no need to run the comparison tools, which are designed to find matches. I compared the lines of Linux code identified by Mr. Gupta with the specific lines of System V 4.2 ES-MP code that Mr. Gupta claims matches the Linux code. As is obvious upon review (and may be obvious even to a non-technical reviewer), the Linux code cited by Mr. Gupta does not contain any of, and is not in any way similar to, the Unix code that he cites. The code is entirely different. In my opinion, therefore, the code cited by Mr. Gupta for ipc/util.c and kernel/futex.c cannot be considered modifications or derivative works of Unix System V.

    2 While COMPARATOR ignores a single line comment, i.e., a line of text that starts with "/*", it does compare the English text that appears in multi-line comments, allowing it to find identical or nearly identical multi-line comments in code. This is useful because overlaps in English comments can be an effective indicator that we ought to search for both literal and non-literal similarity in the source code that follows the comment.

    3 If we look for 3-word sequences in common (e.g., "used by the "), we would find far fewer of them, and could use those more reliably to build up evidence for matches.

    4 This is analogous to finding that the following two English sentences have exactly the same syntactic structure, yet are clearly not copied from one another: (a) "The tall boy threw the ball to the dog,"" and (b) "The coded message divulged the secret to the spy."

    5 More precisely, strings and numbers are considered the same only if they have the same hash value when hashed in a 256-value key. This is, in effect, a slightly "noisy" equality test; a few strings and numbers that are not in fact equal will be reported as equal. Note that this, too, makes our search more conservative, i.e., it will report a few more false positives.


      


    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code | 772 comments | Create New Account
    Comments belong to whoever posts them. Please notify us of inappropriate comments.
    Corrections (if any) here please
    Authored by: MadScientist on Friday, September 17 2004 @ 09:36 AM EDT

    [ Reply to This | # ]

    The Official Troll of Groklaw (Biff) post here please
    Authored by: MadScientist on Friday, September 17 2004 @ 09:38 AM EDT

    [ Reply to This | # ]

    OT and Lynx
    Authored by: WhiteFang on Friday, September 17 2004 @ 09:41 AM EDT
    :-D

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 09:46 AM EDT
    Hee,

    This is good news.

    Shane.

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 09:46 AM EDT
    Another great job PJ. I love IBMs documents and the documents of their experts. Clear, consise and a joy to read to a person who is not a legal expert.

    This is a great example to everyone on how a legal document should be structured and SCO would be well advised to look at this as their template for their 'experts' when they do their declarations.

    Thanks for the great work and keep it up.

    [ Reply to This | # ]

    COMPARATOR
    Authored by: WhiteFang on Friday, September 17 2004 @ 09:54 AM EDT
    I hope Dr. Davis' assistant sent the performance improvements back to Erik
    Raymond. It'll nice to have the actual code available which was used to prove
    SCOX was wrong.

    [ Reply to This | # ]

    DELL?
    Authored by: Anonymous on Friday, September 17 2004 @ 10:01 AM EDT
    <I>20. These comparisons required on the order of 10 hours of computation
    time on a dual 3 GHz Xeon processor system with 2 GB of RAM. This is a high-end
    workstation routinely and easily available off the shelf from commercial vendors
    such as Dell.</I>
    <P>
    I wonder if there is some specific reason he didn't say "IBM" there...

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: plungermonkey on Friday, September 17 2004 @ 10:01 AM EDT
    That swooshing sound you just heard was the sound of SCO's case being flushed
    down the toilet...

    ---
    An ignorant person knows no better, a stupid person knows better and still does
    what is wrong. Which one are you?

    [ Reply to This | # ]

    10 Hours
    Authored by: Anonymous on Friday, September 17 2004 @ 10:02 AM EDT

    I wonder how SCOG is figuring on their 27,000 man years. Specially considering Dr. Davis probably started up his search then went to see a movie as the computer did the work. Gotta love automation.

    Apparently SCOG doesn't like automation very much if it's truly going to take them 27,000 man years.

    RS

    [ Reply to This | # ]

    Qibbles
    Authored by: MadScientist on Friday, September 17 2004 @ 10:02 AM EDT
    Prof D continues to impress. This testimony is how experts should present
    materials to the court.

    Having said that I can find three tiny quibbles with the testimony.

    +++++++++++++

    Paragraphs 27 and 43 have a similar problem

    27. ... "At my direction, one of my assistants ran the IBM Code and the
    Unix System V Code through the COMPARATOR and SIM programs to generate a set of
    initial matches."

    43. "SIM was modified by my assistant to reduce the number of false matches
    it produced."

    The assistant(s) probably should be named in case SCO want to depose them. Why
    they would need to I have no idea - but this is SCO.

    +++++++++++++++

    Paragraph 29.

    "These coincidental matches arise in much the same way that, if we compared
    the entire text of two novels (e.g., War and Peace and A Tale of Two Cities), we
    would surely find that they both contain the phrase "and then they"
    somewhere within them. Such coincidences of common language are no more
    indicative of copying in English than the corresponding matches of programming
    text are in the large bodies of code examined here."

    As Im sure Prof D knows 'War and Peace' was written in Russian. 'A Tale of Two
    Cities' was written in English. I would be very surprised if there were matches
    in these two novels given the different alphabets and languages. Both do have
    small amounts of French text which might match if the French was written in
    Roman script in both. (This I dont know). Other matches in these texts would
    probably be very good evidence of copying.

    +++++++++++++++++

    Paragraph 43

    "SIM was modified by my assistant to reduce the number of false matches it
    produced."

    The modificatons have not spelled out. This is very unlikely to be important but
    it is something SCO could dispute. They may have been listed elsewhere in which
    case this quibble is incorrect.

    +++++++++++++++

    Having said that I would have no problem accepting this testimony were I the
    Judge. It is like all of IBM's work: first class.

    [ Reply to This | # ]

    Doesn't add up
    Authored by: Anonymous on Friday, September 17 2004 @ 10:04 AM EDT
    If IBM has this declaration behind them, then why are they dragging their feet
    on turning over all AIX code to SCO?

    Something here doesn't add up. Why the big brouhaha if IBM has nothing to hide?
    Just turn over everything SCO is requesting and we can all move on. Unless, of
    course, you DO have something to hide.

    I'm no SCO fan, but they might have more of a case than we think.

    Cecil

    [ Reply to This | # ]

    This is the end, my only friend...
    Authored by: Anonymous on Friday, September 17 2004 @ 10:32 AM EDT
    The most delicious irony in this buffet of irony has to be paragraph 30, where
    he gives an example of a comparison "hit", then explains why it is not
    relevant:

    it is as if we had found two bodies of unrelated English text that each happened
    to conclude with the words "the end".

    So he read through the evidence, and in the #endif, there was none.

    [ Reply to This | # ]

    SCO has made NO depositions?
    Authored by: Anonymous on Friday, September 17 2004 @ 10:39 AM EDT

    Somebody gotta help me with this one. Biff?

    IBM has said that SCO hasn't deposed ANYONE yet. In the whole year
    and a half this has been going on. No one. Zilch. Zip. Nada.

    Why? What effect does this have on the judge, when s/he sees that one
    of the parties has done virtually nothing to advance its case?

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: dscribner on Friday, September 17 2004 @ 10:41 AM EDT
    Now that's an expert!

    ---
    Yes, it *will* work!

    [ Reply to This | # ]

    A question about the methodology
    Authored by: insensitive clod on Friday, September 17 2004 @ 10:55 AM EDT
    I would have expected that Mr Davis would also have taken some unix source, modified it slightly to create a derived work, and then show that his method actually finds all these artificially created derived works. From his declaration only, how would a judge conclude that the programmes he used work at all?

    ---
    Lemmings vs Penguins

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: belzecue on Friday, September 17 2004 @ 11:12 AM EDT
    Don't forget he reserves the right to augment his findings later. After all, he
    still has 24999.981 man-years to go. :-)

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: hardcode57 on Friday, September 17 2004 @ 11:20 AM EDT
    Wow, I would feel sorry for SCO, except I'm not that nice.:-)

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 11:20 AM EDT
    Don't get too excited. SCO is about to be saved. Dan Rather has received faxed
    copies of the infringing code from an unimpeachable source.

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 11:29 AM EDT
    Something disturbs me here. The exhibits SCO has held up in court to demonstrate
    identical code would not have been flagged as identical by the two comparators
    quoted in R. Davis's declaration (int a, b; is different from int a; int b; in
    both tests as I understand the description). That would be a point where SCO
    could wipe the floor with Mr Davis.

    However, those files didn't actually contain code that does anything; they were
    interface declarations. The structures can not be sematically different without
    specifying different interfaces. According to the abstraction filtration
    whatever, this means that the replication is not a case of copying. That point
    doesn't seem to get made anywhere.

    There are already too many analogies used here, but what will have to be made
    clear to the court is that just because Ford made the first car in the US with a
    gas, brake, and clutch pedal in that order doesn't mean they can sue GM for
    infringement of copyright. That's just interface that's needed so people don't
    accelerate when they mean to brake and so on. They could sue, however, if the
    cable from the gas pedal is connected to the same kind of doohickey inside the
    engine, because that's not relevant for the interface and is purely
    implementation.

    Just my two cents, but I'm not so heavily into hero worship as some folks on
    this thread. I just think somebody should tell the judge that the whole issue of
    identically looking *.h is misleading.

    [ Reply to This | # ]

    Punitive Damages?
    Authored by: Anonymous on Friday, September 17 2004 @ 12:19 PM EDT
    Is it possible that the court may hit SCO with punitive damages, and is there
    any benchmark for doing so in a case with similar issues?

    Would the court possibly award punitive damages of somehting along the lines of
    $10,000,000?

    [ Reply to This | # ]

    ipc/util.c
    Authored by: Manfred Spraul on Friday, September 17 2004 @ 12:38 PM EDT
    The Davis declaration quotes the Gupta declaration: Gupta claims that ipc/util.c
    (lines 119-52) is derived from System V.
    That's a very interesting claim:
    In all 2.6 kernels that would be the grow_ary() function. This function allows
    to change number of System V interprocess communication objects at runtime.
    This capability doesn't even exist in System V!
    At that time, such parameters were set at compile time. Just search in
    groups.google for SEMMNI for matches before 1995: Several hits that mention
    configure/rebuild.

    It's sad that Gupta won't have to repeat that claim in a witness stand under
    cross-examination. I'd love to see him explain the legal theory under which a
    Linux function that implements a feature that doesn't exist in SysV is derived
    from SysV.

    [ Reply to This | # ]

    So this is the big Kaputski ?
    Authored by: waltish on Friday, September 17 2004 @ 01:15 PM EDT
    This is like checkmate by the queen, only way out move the King (drop the case)
    or Take the Queen (get experts to prove IBM's expert wrong and them right) so
    its Finito La Musica ?

    Am I right the only way out is to prove it by equal or better experts,lay
    opinions wont by enough?

    w

    ---
    To speak the truth plainly and without fear,Is powerfull.

    PS: Beware the Gestank of SCO.
    PPS: SCO's argument does not withstand analysis.

    [ Reply to This | # ]

    Who opened the Troll pens...?
    Authored by: Groklaw Lurker on Friday, September 17 2004 @ 01:24 PM EDT
    Someone must have inadvertently left the gates to the troll pens open today.
    Could you moderators (PJ's assistants) please round up these trolls and put them
    back in their pens (in other words... Delete their posts before they get fed).

    The Lurker

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: drh on Friday, September 17 2004 @ 01:33 PM EDT
    This will probably end up being the most fought over
    document in this case. It does away with almost everything
    SCO has accused IBM of doing.

    First Dr. Davis is, in this particular instance, the best
    qualified expert either side could use. IBM chose him not
    only because of his qualifications, but because he works
    for MIT. SCO said to the press (and IBM has reported these
    statements in court for the record) that they had experts
    from MIT find instances of copying. I don't believe there
    is anyone more qualified at MIT to answer this question of
    copying than Dr. Davis, so even if SCO could show there
    was a comparison made (doubtful), Dr. Davis trumps that.

    Second, it took Dr. Davis and his assistant 10 hours to
    make the comparison against almost 68 million lines of
    code. If they both worked at it for all 10 hours each
    (doubtful) that's 20 man hours. This shows that SCOs claim
    of thousands of man-years is provably false, and can be
    dismissed. It shows that it takes a fairly short amount of
    time to perform such a comparison, so SCO has no excuse
    for not doing it up to now. It shows that it only takes
    two people to find code similarities, not armies of
    experts (it will take SOME experts to determine if the
    positive hits are actually infringing). Basically it shows
    that SCO or its lawyers have been negligent in their
    duties prosecuting this case.

    Third, and a very important point, is that the 27000 lines
    IBM provided for comparison were already known not to
    exist in System V because I believe they were never put
    there. This is most likely the code IBM developed
    separatly for inclusion into AIX, Dynix, OS/2 or Sequent
    but was not included in SCO System V. I would have been
    VERY suprised to have seen ANY positive hits, because they
    would have meant that SCO had illegally copied IBM code
    into System V thereby giving IBM further grounds to
    prosecute SCO.

    Fourth, this comparison was made against all 68 million
    lines of System V, Dr. Davis admitted that he did not
    filter out anything that was public domain, etc until a
    positive hit was found. This means that the IBM code was
    compared not only to SCO code, but also to Berkeley, ATT,
    DEC, and all the other contributors to Unix over the
    years. And there still were no positive hits. That's
    telling.

    Fifth, Dr. Davis used tools in which the source code is
    available. This means that there can be no questioning the
    method, because the entire operation of the tool is
    available for all to see. SCO cannot say that there was a
    possibility of error, they have to PROVE the method is
    faulty by examining the code and showing where the error
    occurred.

    Finally, in general, (and this applies to Linux as well)
    if your program contains almost 68 million lines of code,
    it is broken. Go back to the drawing board and do not
    return until you reduce that number to a reasonable
    amount, say 2 million tops.


    ---
    Just another day...

    [ Reply to This | # ]

    A question...
    Authored by: Anonymous on Friday, September 17 2004 @ 01:36 PM EDT
    Is this a troll or do you actual mean what you say? I'm just wondering because
    I found something you said interesting but I'm not sure if it was a true
    statement. It's just the way you came off, like you are trying to incite
    something that's preventing me from taking you seriously. Anyways, just
    wondering?

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 01:57 PM EDT
    WAIT! There is a flaw! (NOT A TROLL)

    Didn't IBM provide all 232 "versions" in a *PRINTED* form? (remember
    all of those BOXES... didn't they contain PAPER not cds nor dvds nor tapes???)

    You cannot really scan/ocr (and/or manually review) like Dr. Davis did. (You
    *would* have to have 25000 man-years... ok... not really...)

    Let's see... not sure how many files/pages make up the 27K of IBM files, but
    let's say it takes a week to locate all pages, then you have someone type it in
    (or scan/ocr/proof).

    You could hire out scanning/typing to temp agencies (or a firm that does that
    specificially). So... let's assume it's 27K/60 lines per page or 450 pages.
    *That* is not too hard to accomplish in under a month. (probably 2-3 weeks if
    you put some resources on it) Then in 10-20 hours of review you could have your
    electronic version of the files to compare.

    So, I would have to say that it could take a month or so longer for SCO to do
    the same analysis. (If it were so inclined to actually search. I really do not
    think they are *or even have tried* to find any infringments.)

    BTW, Quartermass, I really enjoy reading your analysis.

    And, I, too, would do a pre-order for your book PJ. (as long as it had the back
    cover with a picture of you in your red dress)

    Thank you so much for such an entertaining website. I cannot help feeling like
    I'm just hanging on the edge waiting for the next installment of
    news/tidbits/theory/fud/fud-rebuking...

    Keep up the good work!

    [ Reply to This | # ]

    Wouldn't it be funny...
    Authored by: Anonymous on Friday, September 17 2004 @ 02:16 PM EDT
    Wouldn't it be funny if when IBM wins against scog, that the court awarded
    damages to IBM in the amount of the present value of scog (effectively giving
    scog to IBM).

    Then IBM could release all of scog's source code under the dreaded GPL! ;-)

    [ Reply to This | # ]

    A correction for Dr. Randall!
    Authored by: Anonymous on Friday, September 17 2004 @ 02:33 PM EDT
    Twice in the declaration he states that single-line comments in C begin with
    "/*". In fact, this is how a multi-line commment starts (it ends with
    "*/"). A single line comment begins with "//".

    Its an understandable and relatively unimportant mistake, but it does make me
    question his skills just a little :/ I mean, I could have written the
    comparison software myself and not confused "/*" with "//";
    how come I don't make $550 and hour?

    [ Reply to This | # ]

    Why do they want more code?
    Authored by: cyxs on Friday, September 17 2004 @ 03:13 PM EDT
    One thing that has me confused is why they want every version of AIX. Nothing
    more can be learned from 10 year old code. There trying there modified works
    thing I know. But they already have recent versions of AIX and what they
    submitted to the Linux kernel. So why not compair what they submited to Linux
    and what they have in AIX. This would show any copying of AIX to Linux, then
    they would have a reason to ask for more from AIX as then they need that codes
    history if its related to Kernel functions and not something IBM made. And I'm
    surprised that IBM hasn't pointed this out. And I know they sounds like a troll,
    but I'm posting this cause I really haven't read anywhere that IBM has said they
    already have there Linux submitted code. Thats all thats involved in the PSJ is
    IBM's Linux Related Activities. i.e. there code submissions not the entire
    kernels....

    just my 2 cents..

    [ Reply to This | # ]

    Oh so telling....
    Authored by: kberrien on Friday, September 17 2004 @ 04:43 PM EDT
    >even though SCO did not identify any corresponding AIX,
    >Dynix, or Linux code for such files

    Even if one where to wonder if IBM lawyers exaggerate their claims of SCO not
    complying with the "what code, with specifics" order, this is good
    proof. If you want your expert to refute evidence, you don't hold it back.

    The expert is declaring under oath to the court, and has no reason to
    "exaggerate" anything. Its his butt on the line.

    It's really hard to think you have a case, when the opposing expert doesn't know
    what to compare your evidence too!

    [ Reply to This | # ]

    • Oh so telling.... - Authored by: Anonymous on Friday, September 17 2004 @ 08:11 PM EDT
    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 04:43 PM EDT
    More likely he is talking about discarding comments that fit on a single line:

    /* this is a single line comment */

    but keep those that span on multiple lines:

    /* this one span
    on multiple
    lines
    */

    [ Reply to This | # ]

    Ouch!
    Authored by: AHGrayLensman on Friday, September 17 2004 @ 04:53 PM EDT
    24. For purposes of my review, I did not first apply the "abstraction" and filtration" analyses to the Unix System V Code. Instead, to be conservative, I assumed that all of the Unix System V code was in fact protectable (although I do not believe all of such code in fact to be protectable)...

    Ow, that's gonna leave a mark. Not only does one of the world's foremost experts not find any matches, but he implies that even if there were, he doesn't think that all the SysV code would be protected anyway.

    --Troy

    ---
    "You are finite, Zathras is finite, this... is wrong tool. No, not good, never use this!" --Zathras, "War Without End (pt. 2)", Babylon 5

    [ Reply to This | # ]

    Quicksand Gupta
    Authored by: hal9000 on Friday, September 17 2004 @ 04:55 PM EDT
    Is it true that the supplemental declarations by
    Gupta and Co were changed to opinion testimony?

    To defeat the CC10 PSJ they must summit expert
    Testimony, which would introduce material facts
    To counter IBM.

    This means that Judge Kimball just has to review all submitted
    Documents and ignore the opinion and hearsay exhibits submitted
    By SCO.

    The cry by SCO for additional code discovery would fall on deaf
    Ears by Magistrate Wells because IBM can argue

    1. SCO have done no Code analysis with the code already provided.
    2. Provided no Documentary evidence for any similarity between Unix SysV and
    Linux.
    3. Started a 5 Billion dollar law suit on a hunch.
    4. Provided Opinion Testimony and hearsay evidence.
    5. Listed Code fragments which are not similar in syntax or structure
    6. Performed one deposition.
    7. Fall asleep in court hearings.

    I am convinced I could have done a better job for SCO with little or no
    training.

    My initial claims would have been different.



    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: webster on Friday, September 17 2004 @ 04:59 PM EDT
    1. Remember, IBM had their own in house experts do this years ago. So did
    every expert involved with Linux. This guy is independent and his credentials
    pack immense clout.

    2. Immense clout because he was in on the seminal case that prescribed the
    "abstraction, filtration, comparison" test. He devised the test.

    3. Ten hours to do the comparison. That is $5,500. Not bad for a day's
    work. He than had to write the declaration and talk to the IBM people, four
    hours more, so add another $2,200. Deduct pay to his assistant, $350.

    4. It is nice to glimpse over the shoulder of a master. He makes
    specialized, complicated stuff seem so clear.

    5. With these tools out there, no wonder SCO hit a stone wall. If they were
    saying the things they have been saying without an AFC study telling them there
    was copied code, then their statements have been at least grossly negligent, if
    not fraudulent.

    6. If the SCO attorneys opined that there was copyright infringement without
    a reliable AFC study, that is malpractice.

    7. If the SCO attorneys filed suit without a reasonably professional AFC
    study, they have committed malpractice and filed a frivolous suit. So where is
    it?

    8. Wouldn't that be infuriatingly ironic if the SCOnks could shift the blame
    for all this to their attorneys who didn't have an AFC study done! If the
    SCOundrels told their attorneys they had one, that is, of course, a different
    story.

    9. Did RBC and BayStar invest millions in this scheme without an AFC study?

    10. SCO or SCO attorneys at least have had these results since before they
    started stonewalling. I'm sure IBM sent them some tests results and warned them
    not to throw any frivolous code out, and they haven't directly.

    11. If you wanted a firm foundation on which to base a Summary Judgment, this
    is it.

    12. So did SCO file suit on an inadequate copyright code study that didn't
    filter? If so, the bases for their suit, the millions of lines of code, does
    not exist. So now they change course to a 'derivative' code claim, not because
    they are aware of any code, but because it is the only claim possible at this
    point. They obviously have not found anything in the AIX versions disclosed so
    far. This is a strong indication that the odds of there being any derivative
    code in the discarded files is much less. They are hoping for a miracle,
    praying for a settlement. How about legal fees, 'fessin' up, and cooperation
    against the monopoly? They are doing everything but admit that as things now
    stand: NO CODE, their lawsuit is frivolous.

    13. Notice that today in the comments below this article, as well as in court
    filings and in their statements, they confuse their need for code as a defense
    to their copyright inadequacy. Their discovery need for all AIX code only
    refers to their derivative code claim. They have abandoned their copyright
    claim. This declaration has nothing to do with their AIX claim. The PSJ on
    copyright has nothing to do with AIX derivative Code claim. They present no
    code nor ask for any to prove copyright infringement. The PSJ should be a
    walkover.


    ---
    webster

    [ Reply to This | # ]

    so strange
    Authored by: Anonymous on Friday, September 17 2004 @ 06:16 PM EDT
    This court case is so strange. Invariably, when a technical matter is in
    dispute, each side will hire experts that will present an opinion that supports
    its side, and then the jury gets to decide which set of experts to believe.

    In this case, IBM has Davis and Kerrigan (sp?), but SCO has no expert witnesses,
    at least so far as I have heard. That means no facts are at dispute, and the
    judge has to just accept IBM's side, and so can go ahead with a PSJ. As I said,
    very strange.

    [ Reply to This | # ]

    • so strange - Authored by: Anonymous on Friday, September 17 2004 @ 06:25 PM EDT
    • Kernighan - Authored by: Anonymous on Saturday, September 18 2004 @ 03:25 PM EDT
    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 06:19 PM EDT
    10. I have been instructed by counsel that one work is a "derivative work" of another under federal copyright law if it incorporates in some form a portion of the preexisting work and is substantially similar to the preexisting work.

    He's being so modest, given the fact that he played a key role in developing the accepted interpretation of that law, and probably knows as much about it as IBM's attorneys.

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 06:23 PM EDT
    I undertand filtration to involve the removal of at least the following elements: ideas, purposes, functions, procedures, processes, systems,

    Well, there goes all those Unix ideas, methods and procedures that SCO is always talking about. End of case.

    [ Reply to This | # ]

    The real deal
    Authored by: Anonymous on Friday, September 17 2004 @ 06:24 PM EDT
    I think we should keep these findings in perspective. Not only prof. Davis
    couldn't find anything when running the comparisons, but there is not an iota of
    SCO's millions of lines of code and their already prepared proof. I mean, if I
    were in SCO's shoes and if some schmuck threatened me with a PSJ on copyright
    infringement (where they don't actually have to prove the negative, but I (SCO)
    have to prove the positive), I would simply show them all those millions of
    lines of code that are infringing. Simple, no?

    But where is it? Where's the damn code Darl? This whole thing goes away if you
    show just one page of it. So, time to talk is over - let's see it! Darl?

    [ Reply to This | # ]

    • The real deal - Authored by: Anonymous on Saturday, September 18 2004 @ 12:40 PM EDT
    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: Anonymous on Friday, September 17 2004 @ 06:31 PM EDT
    There are 67,000,000 lines of code in SVR4? And, from what I recall, IBM said
    only 78,000 lines of SVR4 code were incorporated in AIX. Quite a bit different
    from what SCO says about how AIX is basically SVR4 with some modifications.

    [ Reply to This | # ]

    OT: Whatever happened to SCO owns *all* Dynix
    Authored by: Anonymous on Friday, September 17 2004 @ 07:12 PM EDT
    If you check back to SCO's rule 56f motion, they claimed that whether SCO owned
    the whole of Dynix was a question for the jury. This was based on their
    incredible stretching interpretation of Liu v PriceWaterhouse

    If you check back to the story covering IBM's reply memo in support of PSJ on
    Counterclaim 10, IBM didn't mention this claim in their memo.

    Now, given the thoroughness which IBM have demolished every last detail of SCO's
    arguments, this omissions seems rather odd. It was even remarked upon by some
    groklaw posters towards the end of the IBM reply memo coverage on groklaw.

    So Wednesday, they turn up for oral arguments, and this argument doesn't get
    mentioned at all. SCO didn't even mention it.

    What happened?

    Did SCO withdraw the claim (note there are some gaps in the numbering of the
    filing).

    I have no information if they did or didn't, or why they would.

    But one thought: It is my understanding of the rules that if a party presents a
    frivilous claim, the other party can serve a notice giving 21 days notice that
    they intend to file a rule 11 sanction. The party who filed the frivilous
    claim, then has the option to either withdraw the claim, or to persist with it -
    in full knowledge that they may face sanctions for persisting with the claim if
    found frivilous.

    [ Reply to This | # ]

    So what was it that DiDio saw?
    Authored by: Anonymous on Friday, September 17 2004 @ 07:27 PM EDT
    Remember when dear Laura was shown some blocks of allegedly copied code, and she
    said publicly that Linux did indeed appear to contain copied code? Did SCO show
    her and others "doctored" code or was she just hallucinating?

    [ Reply to This | # ]

    Little niggle - no matches is always suspicuous
    Authored by: Anonymous on Friday, September 17 2004 @ 08:29 PM EDT
    The only thing that struck me as strange was SIM was reporting too many
    (presumably false) matches, so it was modified and, afterwards, found no matches
    at all.

    Seems that this casts a poor light on the results from the (modified) SIM. I
    would have though that, finding no hits at all, the programmer modifying SIM
    would have been very suspicuous, and perhaps would try a different set of
    changes that didn't filter everything out.

    At the least some preemptive justification would seem prudent.

    As an expert for the defence, zero possible matches is very convenient.

    .esq.

    [ Reply to This | # ]

    Laura has the Dan Rather Effect!!!!
    Authored by: kberrien on Friday, September 17 2004 @ 09:04 PM EDT
    Laura probably saw whatever code SCO wanted to pull out - illegitimate (or more
    likely, un-researched). She isn't qualified as a journalist (in my opinion)
    much less a code specialist.

    I've been in IT for a long time now, but not being a kernel hacker, not having
    exposure to Sys V code, and not having my own copy of Sys V source to varify
    against....

    But I wouldn't walk into SCO headquarters, look at two code printouts side by
    side, and walk out and write about how the evidence was compelling, just based
    on SCO's word, and two printouts. I'd ask for copies, to verify with MY OWN
    experts.

    Hell, look at whats happening to CBS & Dan Rather because is would appear
    they didn't check the evidence enough. Unfortunately, Laura isn't held to the
    same standard in the "trade press".

    [ Reply to This | # ]

    Smoking Gun
    Authored by: Anonymous on Friday, September 17 2004 @ 09:04 PM EDT
    Gupta--Here's the Smoking Gun.
    Davis--Take the cigarette out of the muzzle and look again.

    --Bill P

    [ Reply to This | # ]

    OT - Contempt of Court
    Authored by: hal9000 on Friday, September 17 2004 @ 10:33 PM EDT
    Is it possible that the lawyers for SCO were teasing
    judge Kimball at the hearing ?

    There was mention of Judge Wells assistant in the audience
    taking notes. They may need the court report of an officer
    to avoid an attempt by SCO to appeal.

    Especially if Judge Kimball lashed out at any time.

    Comments have been made that he is a very amiable and
    funny man.

    But the initial court reports said that he
    may not have been in a very good mood that day.

    If the SCO lawyers try for an outburst from either
    the Magistrate or the judge, this may provide evidence
    for an appeal. So the court officers will be watching
    very closely to their conduct.



    [ Reply to This | # ]

    SCO'S FraudSource
    Authored by: Anonymous on Friday, September 17 2004 @ 10:56 PM EDT
    Does anybody find it interesting that SCO based their licensing scheme on the
    claim that Linux included SCO's Unix IP?

    If I had bought an SCO license a while ago and now found that SCO had not even
    checked Linux vs Unix and an outside expert says that there is no infringing
    code, I would be calling the SEC, FTC, and the FBI.

    Lock em up, throw away the key...

    [ Reply to This | # ]

    origin of arch/i386/kernel/srat.c
    Authored by: Anonymous on Friday, September 17 2004 @ 11:43 PM EDT
    One of the files in the Linux 2.6.5 section of table 1 is srat.c. The comment
    at the top says that "some of the code in this file has been gleaned from
    the 64 bit discontigmem support code base". You can trace the early
    versions of this at https://sourceforge.net/projects/discontig. IBM did not
    write this code, they merely ported it from ia64 to i386. Any good code
    comparator will show many items of literal copying (which is just fine because
    the source of the copy is GPL code).

    [ Reply to This | # ]

    Dr. Randall Davis's 2nd Declaration - I Found No Identical or Similar Code
    Authored by: CyberCFO on Saturday, September 18 2004 @ 12:38 AM EDT
    I just ran linux 2.4.27 against 2.6.8.1 with the comparator and got no matches. Does this mean that 2.6.8.1 is not even a derivative of 2.4.27? Am I doing something wrong?

    Here are my results:
    $ ./comparator -v
    -N line-oriented,remove-whitespace,remove-braces,remove-comments ../linux-2.4.27
    ../linux-2.6.8.1
    % Scanning tree ../linux-2.4.27...reading 12608
    files...100%...done, 12608 files, 3885576 shreds.
    % Scanning tree
    ../linux-2.6.8.1...reading 15946 files...100%...done, 15946 files, 4486776
    shreds.
    #SCF-B 2.0
    Filtering: language
    Hash-Method: RXOR
    % Hash merge done,
    8372352 shreds: 0h 11m 18s
    % Sort done: 0h 0m 16s
    % Compaction reduced 8372352
    shreds to 5927366: 0h 0m 0s
    % Extracting duplicates...100% done.
    % 0 range
    groups after removing unique hashes: 0h 0m 2s
    % 0 range groups after merging: 0h
    0m 0s
    Matches: 0
    Merge-Program: comparator 2.5
    Normalization: line-oriented,
    remove-whitespace, remove-comments, remove-braces
    Shred-Size:
    3
    %%
    ../linux-2.6.8.1: matches=0, matchlines=0,
    totallines=6333562
    ../linux-2.4.27: matches=0, matchlines=0,
    totallines=5508749
    %%
    

    [ Reply to This | # ]

    But it is impossible!
    Authored by: Vaino Vaher on Saturday, September 18 2004 @ 02:04 AM EDT
    The topic at hand is comparing Gupta's exhibits to Linux, so this may be a
    little bit OT, but...

    It is litteraly impossible that there would be no matching code at all in
    Linux!
    That code may not be copied, but purely coincidental. Or the code may be
    identical, but not copyrightable. It may even be code submitted by Caldera. But
    one thing I am sure of is that it is there. At least if all I was looking for
    one or a few lines of matching code.

    So why on earth did Gupta not submit such examples? He may have been ridicuded
    by Groklaw etc, but at least he would have shown that there is a conflict that
    needs to be decided bu a jury.

    And I am quite convinced that I would be able to find that coincedental
    similarity by using a script that runs 'diff'.

    [ Reply to This | # ]

    Can the community commission a full code analysis
    Authored by: hoopyfrood on Saturday, September 18 2004 @ 02:06 AM EDT
    Is there any value in the Linux community commissioning its own paternity
    analysis of the kernel. If I understand the story correctly the sealed BSD/ATT
    agreement implys that BSD code is in Sys5. Seems like a cross analysis of the
    BSD and Linux code bases would identify sections that are clear of infringement.


    At first glance it seems difficult nee impossible to make use of the results
    directly. Just trying to think outside the box.

    [ Reply to This | # ]

    Accident or "to be expected" ?
    Authored by: Anonymous on Saturday, September 18 2004 @ 04:24 AM EDT
    "The two "words" -- endif and return -- that appear in the two files are so common in code written in the C language that finding them together like this is purely an accident, of no significance in detecting copying.

    .....

    32. All of the potential hits reported by COMPARATOR were of the type discussed in paragraphs 29 and 30; i.e., they consisted entirely of coincidental matches of common terms in the C programming language. Even two programs known to have no code copied from one to the other will show these sorts of coincidental matches. Given the volume of code in question here (e.g., 68,000,000 lines of Unix code), the presence of these type of matches is both to be expected, and evidence that the tool was in fact performing successfully in finding potential matches.


    Did the good professor make a mistake, or am I misreading this?

    [ Reply to This | # ]

    What would make this stronger
    Authored by: Anonymous on Saturday, September 18 2004 @ 04:55 AM EDT
    Would be if he were to demonstrate finding some copied code in a similar operating system example.

    For example, take some of the historic Berkeley CSRG source archives and see if you can find copied code in current *BSDs. (I'm guessing that DragonflyBSD is the most heavily rewritten.)

    If he could find similar code across 10 years of itensive development (4.4 lite was released in March 1994), that would be excellent evidence that SCO does not need every intermediate version to find copying.

    Various ancient Unixes are also available, and he could see if he can confirm the USL copying claims in 4.3 net.2 and that it was removed in 4.4 lite.

    [ Reply to This | # ]

    OT: Question for Sep 15 Hearing Attendees
    Authored by: NastyGuns on Saturday, September 18 2004 @ 05:55 AM EDT

    Could someone that attended the hearing please answer a question for me. Did Judge Kimball discuss the merits of #212 - IBMs Motion to Strike Decl. of Chris Sontag?

    In PJ's article sh e says:

    What is fascinating about these documents is that SCO seems to be playing one judge against another, trying to get Kimball to overrule Wells. In that, they were partially successful, in that he sped up the process. Now I understand why Judge Wells's assistant was attending the hearing. I must say, Judge Wells impresses me very much.

    After reading that and something from the docket entries, something kept bugging me. But I couldn't place it until now. It has to do with which motions were being heard; and gets back to this article. In that, nobody made mention of #212 - IBM Motion to Strike Decl. of Chris Sontag being heard that day.

    However, when you read the minute entry #302 from the docket, you'll note that the clerk makes mention of #212 having been heard and taken under advisement.

    My take on when PJ mentioned speeding up the process, she was refering to the response and replys due on SCO's Motion to Enforce the Scheduling Order, #281 - Motion and #291 - Memo. That makes me think PJ hit on something more substanstial (sp?) about SCOG playing the judges against each other.

    The fact that #212 is (was?) scheduled to be heard by Judge Wells on Oct 19 according to #285, should also be mentioned. And it's been part of her schedule for the past three or so hearing entries. Also to be heard that day is #190 - SCOs Renewed Motion to Compel Discovery.

    It might not be anything, or it could be something. Just thought I'd mention it in light of PJ's previous comment about playing the judges against each other.

    ---
    NastyGuns,
    "If I'm not here, I've gone out to find myself. If I return before I get back, please keep me here." Unknown.

    [ Reply to This | # ]

    I have a question about "mis-appropriated code"
    Authored by: Anonymous on Saturday, September 18 2004 @ 06:41 PM EDT
    This will be a bit difficult to explain, so please bear with me: Let's say there is some copyrighted code that is found in the Linux kernel. It wasn't supposed to be there, it was put there by someone who wrote the code while at their place of work. This would mean that company holds copyright. This programmer submitted it to the Linux kernel saying it was his own work.

    This company would obviously want to have the offending code removed immediately, and I'm sure it would be. But one of my managers at work is of the opinion that if another company had been using the Linux kernel containing the offending code, and they weren't using a distribution that they paid for (Red Hat, Novell/Suse), but rather downloaded it themselves (Debian, say) for free, the company using the offending code would be liable for damages from the company that holds the copyright. His contention is basically "ignorance is no excuse." Since they didn't have the right to use the code, and were using it to run their business, they would be liable for "back license fees" or somesuch.

    Comments?

    [ Reply to This | # ]

    Groklaw © Copyright 2003-2013 Pamela Jones.
    All trademarks and copyrights on this page are owned by their respective owners.
    Comments are owned by the individual posters.

    PJ's articles are licensed under a Creative Commons License. ( Details )