decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books

Gear

Groklaw Gear

Click here to send an email to the editor of this weblog.


You won't find me on Facebook


Donate

Donate Paypal


No Legal Advice

The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers.

Here's Groklaw's comments policy.


What's New

STORIES
No new stories

COMMENTS last 48 hrs
No new comments


Sponsors

Hosting:
hosted by ibiblio

On servers donated to ibiblio by AMD.

Webmaster
Did SCO Really Reveal the Code to IBM, as Darl Claims?
Thursday, November 20 2003 @ 01:17 AM EST

You may have noticed that in the teleconference on Tuesday, SCO CEO Darl McBride made the claim that they have shown the code to IBM in discovery and that IBM knows exactly what code is in dispute. Specifically, he said this:

" . . .by the way, we have shared the code in question there with IBM under the litigation event. They know what we're talking about there."

There is room for skepticism. While it is impossible to rule out that there may have been code shown privately that is not in the public record yet, if Darl was referring to the list of files it presented to IBM in discovery so far in the record, I think we need to look at those lists of "infringing files" more carefully.

I noticed, as soon as their discovery list of files was released, coders everywhere were fallling over laughing or snorting in contempt. I'm not a coder, so I asked some of our readers to explain why the lists strike them as so pitiful. There were many replies, including some fine comments, but they included and were based on code that went over my head and would not be accessible to some of Groklaw's readers either.

Most people in the world are not programmers, and it's a language we don't know. So my request was for a translation into English, so we too could grasp what they were noticing, exactly what SCO has on their lists, how they likely arrived at the lists, and what it indicates as to how much SCO actually provided during discovery, so we can understand why IBM filed a motion to compel discovery after receiving the lists from SCO.

The final result is mostly Frank Sorenson's work, but it incorporates helpful input from other Groklaw readers, so it represents the work of a group. I hope you enjoy looking at it from this fresh perspective. This case is, after all, about code, so the rest of us can only gain insight by trying to comprehend that part of the story.

Because I am not a programmer, I appreciated Justin Rowles' explanation about the utilities find and grep that Frank talks about in his article:

"Unix provides highly flexible tools for searching directory trees and the files they contain.  The two most common ones are called find and grep.  Use of these tools is taught in 'Unix 101' type classes.  For example, if I wanted to find all the files on my hard disk that started with 'apple' and ended in 'pie', I could use the find tool to do so.  It would find files called 'apple pie', 'apple and blackberry pie' and so on.

grep is a similar tool for looking at the contents of files.  It would be used to look at files and find, for example, which ones contained the word 'custard'.  Usually it searches files in a single specified directory, but it can also be used to search a list of files generated by another command, like find.

"Both of these tools are highly flexible, and can be used together by a competent Unix person to search their disks for highly specific things.  I could use find to find files that are called 'apple something pie', but not 'apple and redcurrant pie' and then check all of those files with grep to leave only those which also contain 'custard'.  I can do all this in one instruction to the computer.

"In fact, in GNU/Linux, grep has been improved.  GNU grep contains the ability to search directory  structures, so I can dispense with step one above.  In SCO Unix, you can't do that, so you need to use find."

Keep this explanation in mind, all you nonprogrammers, as we take a look now at the file lists with Frank. And the other thing you need to know to understand what Frank describes is that a Caldera employee was a key Linux contributor, Christoph Hellwig, and he wasn't the only one, and the evidence indicates strongly that Caldera knew at the time the contributions being made. Old SCO also contributed code to Linux. I think you will conclude, as I did, that when Darl says that they "deep dived" and looked at the code every which way, as he again claimed yesterday, he couldn't have been describing the process used to come up with the lists they have provided to IBM in the court case. They definitely didn't need spectral analysis, the missing MIT mathematicians, or physicists to come up with such lists as those they provided IBM and the court in their Supplemental Responses. Google and a couple of simple utilities are sufficient. With that introduction, here is Frank's article.

*****************************************************

The SCO Group's List of "Infringing" Files -- How Might They Have Come Up With This List?

~by Frank Sorenson

In IBM's Reply Memorandum in Support of their (First) Motion to Compel Discovery (text here), IBM includes SCO's Supplemental Responses to IBM's First Set of Interrogatories (text here) and tells the Judge that SCO is still not answering their questions. One of the responses SCO provided was a list of files that may or may not be infringing, according to SCO. Why might IBM view the list as inadequte? To someone without the programming background, it might be hard to know.

A closer look by a computer programmer, with English translation for nonprogrammers, may give a clearer picture of why SCO's responses were neither "responsive nor identified with meaningful particularity", according to IBM. It also reveals the likely method SCO used to draw up the list, which bears on SCO's earlier claims that it had three groups of analysts, including the MIT mathematicians, analyzing the code.

SCO's response includes five lists from several categories:

  1. A list of "source code files identified by SCO thus far ... part of which include information (including methods) that IBM was required to maintain as confidential or proprietary...and/or which constitute trade secrets misused by IBM..." It's a list of 115 files.
  2. A list of "source code files identified by SCO thus far...which may...include information (including methods) that IBM was required to maintain as confidential or proprietary...and/or which constitute trade secrets misused by IBM..." It's a list of 591 files.
  3. A list of people at IBM that SCO claims to be aware of "in which part of the confidential or proprietary and/or trade secrets [were] known or [have] been disclosed." There are 5 lists of names, whose names appear in the Linux code base, adding up to about 74 people.
  4. A list of IBM copyrights. This is a list of 22 names.
  5. A list of people who "likely have knowledge, although their names do not appear in the Linux code base." It's a list of 62 names.

First, a little background on Linux/Unix utilities and tools, then we will examine each of these lists, how they may have been created, and what (if anything) they mean. We conclude with some general comments.

Background

There are a number of useful utilities in Linux/Unix. Because we will be using some of them in our discussion, we'll briefly mention a few before moving on:

One utility is called grep, and it is a utility designed to search inside a file (or files) for lines containing a certain pattern. In its simplest form, it is usually used like this: 'grep string filename', but it also accepts numerous flags (options) to allow it to perform various functions. When calling grep as egrep, extended pattern matches are enabled. Here, we will use grep to quickly find files containing strings that we are interested in.

Another commonly used utility is find, which is used to search a directory for files having certain properties, such as a specific name or pattern. Here, it will be used to locate files that we are interested in searching the contents of.

sort does just what it says; it sorts a list of strings. It can also be used with the -u option (unique) to remove duplicate references.

cat is used to type out the contents of files, and is very similar to type under DOS/Windows.

xargs is used to execute commands on the output of a previous command. We will be using it to reprocess the output of find commands and the output of other utilities.


SCO's Lists of Files

Let's start with List 2: The list of "source code files identified by SCO thus far...which may...include information (including methods) that IBM was required to maintain as confidential or proprietary...and/or which constitute trade secrets misused by IBM..." This is a list of 591 files.

While this list contains a number of files from Linux, 591 of them, SCO fails to mention what kernel version, and only says they're from 2.4 and/or 2.5 kernels. As IBM correctly points out, "This is no small problem since there are 75 different releases of the Linux kernel 2.5 alone." SCO also says that they do not claim the entire source code found in those files, but that this information is interspersed in those 330,000 lines of code.

IBM also points out that since it is Unix code (SVRx) that SCO claims was misappropriated, pointing to the Linux source code does not really answer their question, which was: from where were the trade secrets misappropriated? SCO passes this argument off by saying that they have not completed discovery, and that since IBM hasn't given them everything they've asked for, they don't know exactly where it came from.

Because SCO is claiming that it is IBM's trade secrets that were misappropriated, they don't have the trade secrets yet themselves. In other words, they need IBM to reveal more information. The question becomes "Why does SCO believe that this list contains their trade secrets if they don't know the trade secrets and need IBM to point them out?"

In attempts to answer this, a number of discussions have occurred, here on Groklaw, on the Linux Kernel Mailing List, and elsewhere. Here on Groklaw, Lev managed to narrow the Linux kernel version down to either 2.5.68 or 2.5.69. Many people were quick to point out that most files on the list contained one or more strings that SCO likes to claim as theirs: SMP, JFS, RCU, and NUMA.

By using the appropriate utilities, it is possible to reproduce SCO's list (number 2) without any manual investigation of the contents of any of those files. A sorted (and cleaned up) copy of SCO's list number 2 is located here for reference. While this solution is certainly not the only one, and is probably not optimal, it is the one that the author managed to construct:

find . -type f -name "*.[ch]" -print0 
   | xargs -0 egrep -wil 'smp|rcu|numa' 
   | cut -c 3- > /tmp/output1

find fs/jfs -type f -path "*.[ch]" -print0 
   | xargs -0 egrep -Li "@sco|@caldera" >> /tmp/output1

egrep -v 'alpha|parisc|sparc|sound|drivers' /tmp/output1 
   | sort -u > /tmp/SCOFiles-list2.output

This may look like quite a mess, but it can be deconstructed into manageable pieces. All three lines really consist of several commands strung together using the |, or pipe. This means that the results of one command are used as input to the next command.

Picking apart these lines, first I found all files with a filename ending in .c or .h (C source code and header files). I searched the contents of these files for any of the strings 'smp', 'rcu', or 'numa' (without caring about upper- or lower-case). I placed these matching files into the file /tmp/output1. Next, I searched the JFS filesystem code for .c or .h filenames, removing any files that mention someone at SCO or Caldera working on them. The results were appended to /tmp/output1. Finally, I searched the /tmp/output1 file and removed all file names referring to alpha, parisc, or sparc (essentially Sun and HP). References to driver files and sound were then also removed.

When applying this process to the kernel versions identified by Lev, we get 3 false positives and 3 false negatives with the 2.5.68 kernel and just one false positive with the 2.5.69 kernel. As the list is otherwise identical to SCO's, I believe that SCO used the Linux 2.5.69 kernel to generate these lists.

The false positive was include/asm-h8300/smplock.h. There may be a number of explanations for this, one of the most likely being that someone at SCO messed up, and missed a line when sending the list to the lawyers. This is, of course, presuming that the person preparing the list used a similar process, which I believe is likely.

What does this mean? Essentially, that SCO searched for any reference in the Linux kernel source for SMP, JFS, RCU, and NUMA, and claimed all of those files as possibly infringing. They included the entire JFS source code, but, perhaps realizing that it would look really bad to claim a file that implicated SCO or Caldera by showing the names of their employees, removed those files.

A number of people have pointed out that some of the files are so trivial that they could not contain trade secrets. For example, include/asm-arm/spinlock.h contains only 6 lines, but is included in the list because it contains the string SMP (as in "we don't do SMP"):

#ifndef __ASM_SPINLOCK_H
#define __ASM_SPINLOCK_H

#error ARM architecture does not support SMP spin locks

#endif /* __ASM_SPINLOCK_H */

In providing this list to IBM, it appears that all SCO has done is to make vague claims over all of SMP, JFS, RCU, and NUMA, which is hardly news, but they have given no explanation of how they created their list of possibly infringing files. They haven't answered IBM's question at all (which relates to original SVRx code), and they look silly in the process, at least to those who understand the code and the list.

It is obvious that SCO did not spend a great deal of time or effort at answering IBM's question with valuable information. If they actually did spend time and effort to produce this list, their technical person is not extremely skilled.


List 1: A list of "source code files identified by SCO thus far ... part of which include information (including methods) that IBM was required to maintain as confidential or proprietary...and/or which constitute trade secrets misused by IBM...", the list of 115 files.

The first thing to note is that the files in this list are actually a subset of the files in List 2. For reference, a copy of SCO's list number 2 can be found here. Using our trusty Linux utilities, we can again construct a sequence of commands that produces SCO's list automatically. The following commands will produce all of SCO's files (again, 100%) with just 2 false positives:

cat /tmp/SCOFiles-list2.output 
  | xargs egrep -l 'International Business Machines|ibm.|IBM Corp' > /tmp/output1

cat /tmp/SCOFiles-list2.output 
  | xargs egrep -wl 'IBM|RCU' 
  | xargs egrep -L 'sco' >> /tmp/output1

sort -u /tmp/output1 > /tmp/SCOFiles-list1.output

These commands first search (List 2) for anything that would be easily identifiable as coming from IBM, files containing "International Business Machines", "IBM Corp", or "ibm." (as could be contained in an email address like username@ibm.com). Next, any mention whatsoever of "IBM" or "RCU" is included, as long as the file does not also contain "sco".

Again, while we do not know for certain that this is the method that SCO used to produce this list, it is easy to demonstrate that even though our commands do not produce an identical list, SCO spent little more time to create this list than List 2.

We are unable to determine determine whether someone messed up and omitted the two false positives, arch/ppc/kernel/setup.c and include/linux/list.h, or whether our search string is not sufficiently developed to produce the same list. What we do know is that this list of "definitely infringing files" is little more than files with IBM mentioned, minus files referring to SCO. IBM is asking for specifics because SCO has given no explanation of how they built their list. Also, they've avoided the question of where in SVRx these trade secrets came from, and why SCO believes they are trade secrets.


List 3: A list of people at IBM that SCO claims to be aware of "in which part of the confidential or proprietary and/or trade secrets [were] known or [have] been disclosed." This consists of 5 lists of authors, for a total of about 74 people.

In SCO's Supplemental Response, they identify a number of people as having disclosed proprietary information and/or trade secrets. They break down these names into "US Authors" (30), "German Authors" (24), "Australian Authors" (2), "Other" (15), and "Austin Office (JFS)" (3). We won't be going into the same detail in analyzing this section because it involves the names and email addresses of people and we have redacted this information from the text version of the document. Those curious should view SCO's filing to see examples.

Suffice it to say that these lists can be regenerated by searching the kernel source for all files containing an email address at IBM. It contains actual lines from the copyright notices contained in the Linux kernel. On more than one, the line also contained references to other email addresses that the person used, and at least one just ends like this: "username@vnet.ibm.com or". The next line in the kernel source file contains the alternate address.

This list is fairly easy to generate, but does require a bit more manual intervention than most of the others. Since some people have contributed using multiple names (such as Pat and Patrick), someone has manually merged these names together. It was done sloppily, though, since there are other email IBM-related email addresses in the source code which are not mentioned.

Here, SCO is apparently telling IBM that they believe that every contribution from IBM is tainted, but they'll need all the source code ever written from IBM in order to prove it. I have serious doubts that everyone that ever contributed to Linux from IBM has done so under such suspicious circumstances (I actually have serious doubts that _any_ contributions are tainted in this way).


List 4: A list of IBM copyrights (a list of 22 names)

This list is as easy to generate as List 3. It is merely a list of all the various copyright notices involving IBM in the kernel source. It's actually a pretty boring list, and doesn't seem to tell anyone much, including IBM. It can be regenerated merely by searching for "Copyright" or "(C)" in the same line as "IBM Corporation". They're all just lines like:
Fred So-and-So, IBM Corporation


List 5: A list of people who "likely have knowledge, although their names do not appear in the Linux code base." (a list of 62 names).

We've left the best for last. Here, we've left the kernel source, but where has SCO gotten this list? Ready? Okay... Here goes. They got it from a Google search.

Well, at least that is what it appears. The fact is that you can find the names on this list by searching on Google for email addresses from IBM that posted to the Linux Kernel Mailing List (LKML). Like I said, I don't actually know that this is how SCO did it, but if you're really curious, look at SCO's filing, then check out Google Groups for messages that hit the Linux Kernel Mailing List: '"ibm.com" group:fa.linux.kernel' (for example).

Without doing an extensive study, it is difficult to know exactly how much (or little) work was done to actually build the list, but it is clear that SCO belives that these individuals "likely have knowledge" because their email address can be found on the Linux Kernel Mailing List. To test this theory (in a highly unscientific manner), we chose 5-10 email addresses from the LKML (compliments of Google) and all were located on SCO's list. We then tested things the other way around, and had similar results. The addresses we chose were easy to find on the LKML. One brief example: SCO's list includes the email address fubar@us.ibm.com, which is easy to find here.

So SCO produced a list that they believe holds the names of people with knowledge of Linux. They may have actually searched the Changelogs, as well. A list of names you can find on Google hardly qualifies as a response to IBM's interrogatory.


Some General Comments

In SCO's list, in the legal document, SCO has replaced all the slashes (/) in the file names with periods (.). There are several theories in the Linux community as to why. One possibility is that the lawyers may have written it up using a program that doesn't like slashes, instead of using Unix or Linux. While I used GNU utilities such as grep, the person preparing the list may have used a different platform.

Regular file/path names can be converted to the dotted format with the following command (if you so desire): 'cat /tmp/SCOFiles | sed s:/:.:g' At any rate, they could be converted back easily enough. Interestingly, the path /arch/ppc64/kernel was also changed to .arch.ppc.64.kernel for some yet unknown reason.

Whoever prepared these lists was rather sloppy. They didn't pay attention to detail, missed obvious files and email addresses, and didn't edit very well. Obvious references to SCO or Caldera have been removed, but some of the less-obvious ones remain. For example, some contributions to JFS by Christoph Hellwig (once an employee of SCO) remain. Presumably, at least some of those contributions occurred while he was working for SCO.

Some of the files included are trivial and obviously contain no relevant information. The 6-line files that just say "we don't do SMP" come to mind.

It is easy for coders to understand IBM's contention that SCO has not been answering their questions, regardless of the amount of data that they have produced. They don't explain how anything they have reported is a trade secret. And the fact that their lists can be recreated over a weekend using simple scripts indicates to us that their answers are too broad to qualify as answers to the questions they were asked.

Maybe SCO hasn't heard the old saying: "Never tangle with a geek when source code is on the line."


Prepared by Frank Sorenson
With numerous helpful comments from other Groklaw Regulars


  


Did SCO Really Reveal the Code to IBM, as Darl Claims? | 168 comments | Create New Account
Comments belong to whoever posts them. Please notify us of inappropriate comments.
Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 01:36 AM EST
The good news is, it seems SCO really has nothing to show. I am really
looking forward to December 6th. How could they possibly delay this any
further? What happens if they refuse to show code? Can they go for an
appeal if the judges drops their case?

[ Reply to This | # ]

Also in list #2 ...
Authored by: AllanKim on Thursday, November 20 2003 @ 02:20 AM EST
List #2 contains arch/i386/smpboot.c, which contains the famous comment:
*      Original development of Linux SMP code supported by Caldera.

[ Reply to This | # ]

Good Sleep
Authored by: Anonymous on Thursday, November 20 2003 @ 02:21 AM EST
Thank You Frank, that was a great chuckle just before
sleep. I should be well rested tonight.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 02:22 AM EST
Maybe that has been the strategy all along. Pump stock to $20 and sell
sell sell. Then loose in the first trial (don't show code, get kicked and
fined). Stock price falls to 0.01c. Buy back stock cheap. Then file appeal.
Sell stock at $50. Repeat, until supreme court has been reached. Then
appeal to government (Linux hackers are all terrorists, remember?).

[ Reply to This | # ]

Not sure that "include/asm-arm/spinlock.h" is the best example of poor evidence
Authored by: Jack Hughes on Thursday, November 20 2003 @ 02:23 AM EST
This example does actually deal with a _method_ for dealing with SMP issues...
obviously, the method is somewhat trivial...

Is the use of pre-compiler conditional compilation a SCO method?

I think a better example is the reiserfs header file (we've had a link to the
discussion about this on the reiser mailing list) where the comment, from Hans
Reiser, says something along the lines of "this will be hard to do in
SMP".... Not really a method, not really anything to do with IBM, clear
who the author of the comment was - therefore they can be asked questions etc.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 02:23 AM EST
In "IBM'S MEMORANDUM INSUPPORT OF SECOND MOTION TO COMPEL
DISCOVERY"

They said that -


"Second, SCO has declined to provide meaningful responses to Interrogatory
Nos. 1 and 2. As explained in IBM's motion to compel responses [to] these
interrogatories, SCO merely provides the names of 591 files (consisting of
approximately 335,000 lines of source code) in unidentified versions of the
Linux 2.4 and/or 2.5 kernels which may or may not contain information to which
SCO asserts rights. Nowhere does SCO detail the nature of its alleged rights.
"

So according to IBM, all SCO "showed" them was this list of files.

At this point, SCO's whole case seems to be based on speculation - they assume
that because IBM added certain functionality to AIX and helped add similar
functionality to Linux, that they MUST have violated SCO's IP rights in the
process. But, up to now, they have provided no evidence to support this
idea....


[ Reply to This | # ]

Why is SCO so sloppy?
Authored by: Anonymous on Thursday, November 20 2003 @ 02:26 AM EST
I really don't understand this. The whole company is at stake and they
include crap like spinlock.h into their filings. This is really a no-brainer
and any $20/h programmer would have caught it (and excluded it). How
can they be so careless? Do they even care if they loose?

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: sjohnson on Thursday, November 20 2003 @ 02:36 AM EST
Maybe SCO hasn't heard the old saying: "Never tangle with a geek when source code is on the line."

This is the best line in the article. That like has great .sig potential.

Frank, thank you for a great chuckle.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: ZeusLegion on Thursday, November 20 2003 @ 02:39 AM EST
I (and probably most people here) pretty much figured it would turn out to be
little more than a search for various keywords. I'm glad someone was able to
recreate how they generated their lists and explain it in English so the public
(and hopefully the judge) can understand SCO's game.

It all boils down to SCO claiming that IBM was not allowed to share any of its
knowledge or code from AIX and System V due to some clause in its contracts with
them and thus anything and everything pertaining to IBM is "tainted"
and thus the world must pay for SCO's authorization (the license) in order to
use Linux.

The entire reason SCO doesn't have any evidence is because there is none. There
is only SCO's belief that anything IBM donates to Linux is a violation of its
contracts with IBM.

So please remind me which clause it is that they think gives them the right to
prevent IBM from donating its own work and what's the evidence that proves SCO
is on crack as Linus suggested (and which we all suspected anyway)?

My guess is that its eventually going to boil down to whether the contract
allows SCO to outlaw IBM from donating its own code to Linux. If not, SCO loses.
Period.

Z



---
Z

[ Reply to This | # ]

Entertainment for the evening
Authored by: Anonymous on Thursday, November 20 2003 @ 02:41 AM EST
I've been lurking here for weeks - read it every night just to get my dose of entertainment. Thanks to everyone who does work like Frank's, and provides thoughtful analysis tearing down SCO's case.

I wonder if IBM figured out how to generate SCO's list... If not, I'm sure they know now :-) Methinks the stuff SCO is shoveling is going to start bogging them down RSN.

[ Reply to This | # ]

Question about JFS
Authored by: error27 on Thursday, November 20 2003 @ 02:45 AM EST
Thanks. That was an excelent article.

You mention some files in fs/jfs/ were omitted because they mention SCO or
Caldera but I didn't see any omitted files.

The fubar email address cracked me up... :)



[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: bobh on Thursday, November 20 2003 @ 02:54 AM EST
This pretty well demonstrates that Darl McBride's statements about his "MIT mathematicians" were fabrications. Pressed to state the claims in court, the best the company could do was provide a quick hack done with grep.

What this tells us is that there is no code analysis. No one analyzed code looking for trade secrets. In all the months leading up to the lawsuit, and in the seven months since filing it, SCO has not determined itself what its charges are. Not at the level of detail that they knew would be required in a court case.

Did McBride think they would just wave their arms and dance, the way they have been doing for the trade journalists and financial analysts? Did they think that would play in court?

What, really, is going on here? These are supposed to be serious adults, playing for real money. Yet they are behaving like children, totally irresponsibly. They have spent tens of millions of dollars themselves, and caused IBM to spend tens of millions more, and yet they have not prepared -- not seriously -- for the day they must have known was coming when they would have to lay out specifically what it was that IBM supposedly misappropriated.

This isn't a game. Yet they come to court with this hack that anyone could duplicate in a weekend with grep, and top it off with a Google search, and hand that in as their 'homework' in a court case where three billion dollars is on the line.

This defies rational explanation. These people have wasted an enormous amount of other people's time and money on something that they aren't even taking seriously themselves.

I hope the judge can be made to understand the depth of depravity he is dealing with here, and that he treats it accordingly.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 03:06 AM EST
Absolutely excellent article, certainly the most englightening I've seen since
this whole scandal began. Congratulations. Moreover, this is a turning point:
no more vague notions of "somehow" infringing code floating over
one's head. Thanks to your efforts, the entire discussion just became very
elementary. Hopefully this will lead to some SCO trouncing in more of the
mainstream press. Cheers.

[ Reply to This | # ]

One shouldn't sniff at spinlock.h
Authored by: Anonymous on Thursday, November 20 2003 @ 03:31 AM EST
Assuming SCO is claiming all of the 330,000 lines of code as their own (which
they aren't), then the 6 lines (counting empty lines) of spinlock.h comprise
.00182% of the $3 billion they're asking, or a very affordable $54,545.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 04:10 AM EST
I JUST FOUND MORE INFRINGING CODE, someone should post these about 1,210,000
entries to sco lawyers so they could sue more people!

http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=SMP&btnG=
Google+Search

[ Reply to This | # ]

Mabye not so stupid...
Authored by: hanzie on Thursday, November 20 2003 @ 04:25 AM EST
I admit that claiming this code in discovery might not be stupid is a longshot,
but standard practice in discovery is to make the list of discoverable evidence
as large as you can possibly get away with. Consequently, if SCO have any claim
of even the flimsiest substance, it would stand to reason that they'd include
everything a grep would find.

You can't use it later if it isn't in discovery, and the more shovelware you
produce in discovery:

1. The better you look to the gullible stock buying public.

2. The more the competition's legal staff has to wade through.

3. It's sanction proof. They're claiming all SMP related code in their case,
so this all discoverable.

4. Every tiny bit of obsfucation will help, a one day delay could literally be
worth millions as stock is dumped.

5. If there is a suprise, non-laughable claim, this is the best way to hide it.
A non-laughable claim might actually get dismissed out of hand, with the rest of
the trash.

"Don't get cocky!" -- Han Solo

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: jmc on Thursday, November 20 2003 @ 04:32 AM EST

Sorry to nitpick but you're using Linux-isms in your find commands which as you say you don't really need on Linux as the directory tree searching can be done in egrep.

  1. SCO's find doesn't support -print0
  2. SCO's xargs doesn't support the -0 switch.
  3. SCO's egrep doesn't support the -w switch and there isn't an easy substitute like b.
  4. SCO's egrep doesn't support the -L switch and there isn't a quick and easy substitute.

I should think they probably used GNU egrep anyhow as find is a bit advanced for them.

That might explain some discrepancies.

Also, I downloaded the SCO Linux kernels yesterday from their FTP site and they seem to have done some random editing (with GNU emacs no less) as the original files with .c~ on the end still appear which might explain some of the other differences.

[ Reply to This | # ]

The ppc.64 thing (minor idea)
Authored by: pyrite on Thursday, November 20 2003 @ 04:39 AM EST
In UNIX, the file directory seperator is the forward slash: "/".

On Windows, it's the backslash: "\". On Windows it would be like:
\arch\i386\etc... not the http://something.com/something/ UNIX style.

There are a number of different "greps", each with its own
peculiarities. It can be frustrating and confusing trying to learn the minor
differences between them all. Some of them run on Windows. The standard
"escape character" is the backslash "\". For instance:
the pipe "|" pipes the output of one command into another when used
on the command line or in a shell script. Used inside the [ ] characters, such
as [numa|smp] it means "either" - find either "numa" or
"smp". To find an actual pipe character in the text of some file,
you escape the pipe character: " \| ". In Windows (and I believe
VMS), the directory seperator is the same as the escape character (in grep). So
instead of an MIT scientist on a Linux machine, perhaps this search was done by
an executive on a Windows box. I am wondering if there might not have been some
kind of (perhaps percieved) conflict because the escape character is the same as
the directory seperator? Combined with someone who didn't really know what they
were doing, maybe something along these lines...

This kind of pattern matching search could be written and performed by an
experienced programmer probably in under 15 minutes, maybe 30 minutes at the
most. And, like mentioned by PJ, most people who have never used these commands
before could figure it out in a relatively short period of time. It's not that
hard at all. What is described here by Groklaw is not any kind of sophisticated
search at all, although it may have seemed difficult to an executive used to
meetings and presentations and interviews; recursive searching, that
"diving deep" search - probably not the easiest thing for a
first-timer on a Windows box.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 05:23 AM EST
Is it actually possible to claim that software represents a trade secret?

After all, the program is there to examine. Anybody can look at it, decompile it
and work out the paths through the code and hence the software design. The fact
that this is difficult (or in some countries illegal) doesn't protect the
"secret" in any manner at all. It's called "security through
obscurity" and widely derided as a the security method chosen by stupid
people.

Trade secrets have another serious flaw. If someone else independently works
out the secret you have no legal protection.

[ Reply to This | # ]

Just a thought
Authored by: Anonymous on Thursday, November 20 2003 @ 05:57 AM EST
Reading the SCO document where they referred to the file listing as flattened
made me wonder if they have copied all the files into a single directory to be
able to search it with whatever tool they're using for the job. :-) Maybe they
like flat structures or maybe recursive searches are beyond them.

Andreas

[ Reply to This | # ]

  • Just a thought - Authored by: Anonymous on Thursday, November 20 2003 @ 08:50 AM EST
  • Just a thought - Authored by: Beyonder on Thursday, November 20 2003 @ 08:59 AM EST
    • Just a thought - Authored by: Anonymous on Thursday, November 20 2003 @ 05:01 PM EST
Code possibly immaterial due to Harry Potter...
Authored by: John Douglas on Thursday, November 20 2003 @ 07:10 AM EST
www.linuxworld.com/story/35007.htm

'SCO apparently sets some store by the recent finding of a Dutch court that a
Russian knockoff of the Harry Potter books that changed Harry into Henrietta,
the scar on Harry forehead into a scar on Henrietta's cheek and Harry's broom
into a magical fiddle broke JK Rowling's copyright. Sonntag claims it supports
SCO's contention that Linux has appropriated Unix' concepts, methods, and
structure.'





---
As a Safety Critcal/Firmware Engineer, everything I do is automatically
incorrect until proven otherwise. (The one aspect of my work that my wife
understands).

[ Reply to This | # ]

Pardon my presumption ...
Authored by: glarepate on Thursday, November 20 2003 @ 09:35 AM EST
I found Frank's article fascinating and accessible, but I realized that even
peeling away the top layers of geek-speak may still leave the techniques he used
to find the list of files impenetrable to non-technical readers. As a result I
have undertaken to try to make a summary of the three find/xargs/egrep/cut
command pipelines that are descriptive of the process he is performing to
generate list 2 but closer to actual English for readers that wish to understand
the steps in the problem solving method (algorithm) without having to take Unix
101 first.

PJ: Please feel free to delete this if my presumption of a need for it runs too
deep or if more technical readers find it too flawed to be allowed to represent
Frank's work.

Feel free to edit it as you see fit as well since my attempts at humor may be
somewhat off the mark.

First off, I have used the term pipeline. It is, as Frank describes, a number
of commands that run one after the other that are all fed to the computer on a
single line (which may wrap around past the edge of the screen but are treated
as a single entity) that work together, each successive one performing some
processing task on the output of the previous one.

This concept was hard for me to grasp when I first started learing about Unix
(on an SCO Xenix machine coincidently, back when SCO was a good thing) because I
was used to using one program to do any work I wanted done and that would be the
end of the processing unless I needed to invoke another program to, for
instance, print my work, or in the case of code, compile it into a module that
makes up part of a program or into a finished program that could then be run to
do other work.

Starting with:

find . -type f -name "*.[ch]" -print0
| xargs -0 egrep -wil 'smp|rcu|numa'
| cut -c 3- > /tmp/output1

This is displayed this way for perceptual convenience but could appear on your
computer in several other forms which would be perfectly understandable to the
machine but would be, if you can imagine, even harder for us humans to grasp.

Let's start with the 'find' command. As Frank points out it looks for files
on your system by examining the names of files to see if they fit the search
pattern that you specify. If no pattern is given it just makes a list of every
file it comes to, starting at whatever directory you are in and going down into
every subdirectory. Sometimes you will want to do this, but mostly you are
looking for something in particular, not just a list of EVERYTHING. In this
example the find command has 4 things it pays attention to in order to figure
out what you want to search for. These are '.', '-type f', '-name
"*.[ch]"' and '-print0'. These so called parameters to the find
program are followed by '|', which is a signal to the system that it needs to
feed the information that find comes up with to the next program in the pipeline
so that it can work it's magic on the stuff that find generated.

So, after 'find' is '.'. That means start here where I am now. The GNU
find program doesn't necessarily really need that, it will just start searching
where you are, but most any other version of find that I have used will refuse
to run unless you give it a 'path descriptor' or starting point from which to
do the search and will simply puke up an error message and stop dead without
proceeding on to the next program in the pipeline. So, although not strictly
required, it is good practice to use something there for reference so that you
avoid any confusion. In the second pipeline he uses fs/jfs instead of '.'.
This just means that there is a directory named fs in the directory where the
command is being run and a subdirectory name jfs that is in the fs directory and
that is where the finding should begin.

Next we come to '-type f'. I combined these in the single quotes to point out
that they work together as a unit to tell the find program to only look for
things that are 'regular files'. Although not strictly correct, I used the
single quotes to help distinguish functional units that contain double-quotes
which help the system figure out that you are referring to some literal search
pattern for the program to use and to not try to match that pattern in the
current directory as that won't give the results that you want. Hazards of
information engineering, I guess. There may be non-regular files in the search
path. Without going into what those are let's just say that they aren't
things we need to look into to see if they contain any of the text that we will
be looking for. These 'special files' as they are called are excluded by
telling find to only look for the regular ones.

Next is '-name "*.[ch]"' (see why I used the single quotes now?).
It tells the find program to further narrow the search to files that have names
that end in .c or .h, but may have ANYTHING else at the beginning of the name.
The next parameter will show you why I emphasize _anything_.

Lastly, for find, anyway, is '-print0'. What this does is cause find to emit
the names of the files it has picked out with a _literal_ zero tacked onto the
end. The reason for this is that Unix filenames can, accidently or on purpose,
contain newline characters and as a result look like two different files listed
on separate lines when they are in fact only a single filename that has been
split by the normal process of displaying text.

So, to summarize:

Start looking in the current directory (and go on down forever if necessary) for
any normal text or program files that have any name whatsoever as long as the
name ends in the letter c or h and has a period in front of those letters. List
it in a way that will allow us to know if it has a newline character embedded in
the name so we don't think we've found two files instead of one. That's all
you need to do, the system will take your output and feed it to some other
program whose name you don't really need to know, thank you and goodbye.

Ready for xargs? Xargs takes whatever is fed to it and makes a command from it
and then runs it. The pipe character '|' feeds the output of find to xargs as
mentioned above, so the first thing xargs need to look at to figure out what to
do with what's being fed to it is '-0'. That means that the list it is
getting has elements that are terminated with a literal zero. Clever those
eunuchs, eh? Must be because they don't get distracted by worldly things and
can just concentrate on the data.

Next it sees 'egrep' which it automatically recognizes as being the command
it's going to run because there are no other parameters telling it about what
to expect or how to act. At this point we will jump to egrep since xargs has
effectively finished it's work and we want to look at egrep in the same level
of detail we did with find and not lump it in with xargs even though xargs is
going to run it.

egrep looks for patterns in whatever is fed to it, in this case through a pipe,
from find, via xargs. Whew! We will look at the instructions that egrep
follows to do it's work in a different order than that in which they appear.
You'll see why by the end of the description.

The subunit 'smp|rcu|numa' is already in single quotes, so I saved two
keystrokes there! What that parameter says is: Look for any instance of smp or
rcu or numa. Normally egrep will print the line that contains those instances
so that you can see what context they are in to help figure out if that is what
you are actually looking for. In this case we want something different ;
that's where '-wil' comes in. What it does is give the _name_ of the file
instead of the line that has the word we are looking for (that's the 'l'
part), ignores capitalization (the 'i' part, I realize I'm repeating Frank
here, but did you actually remember that detail?) and looks for 'whole words'
(the 'w' part) which means that it selects only examples that are at the
beginning or end of a line of text. In truth the 'w' part is actually more
complicated than that. I am over-simplifying it for the purpose of this
discussion.

So our egrep+xargs does:

Watch for filenames names ending in the _number_ zero (not the 0 character,
which is just another text character) and look through each one of those files
for units of text that say smp or numa or rcu, no matter what case they are and
print out the names of the files that you find these pieces of text in in a way
that won't be confusing.

Lastly is the 'cut' command. Cut picks apart what is fed to it based on the
description you give it and then emits what you want as it's output. In this
case we have to look at both parameters separately first and then together. The
'-c' tells cut to 'only emit the following characters'. The '3-' defines
those characters as 'everything from the third character on to the end of the
line'. If I had done better research on this I could tell you why the first
two characters are throwaways. I'm guessing it's because they consist of
'./', meaning "starting in the directory that we are already in".
As such they would not be useful. Not only because we would normally assume
that, but because it would tend to be confusing to have everything listed as,
for instance, './include/config.c', when we are referring to include/config.c,
meaning config.c in the directory named include, which is a subdirectory of the
directory we are in now.

We're not done yet: The last element of the pipeline is '>/tmp/output1',
which means take all the data that we have gathered, searched , massaged and
weeded and send it to a file in the temporary directory (/tmp) and call it
output1. And there is our list of filenames that are used by programmers to
build a Linux kernel that have any reference to the things that SCO claims that
IBM misappropriated. Aren't you glad we have machines to do this for us?

Ready for the second pipeline? I'm not going to cover it in the same
excruciating detail because it's very much like the first one. I will leave it
to you to step through it, if you wish, to more fully understand the details of
how it works after pointing out certian distinctions.

What it does differently is to look through, as I mentioned above, not the
current directory, but the subdirectory jfs which is a subdirectory of the fs
directory which _is_ in the current directory (fs/jfs). These files would
contain files used to include Journaling Filesystem functionality into the
kernel. These are kept in the fs subdirectory as a matter of
"housekeeping" to separate general drivers from filesystem drivers.
It _leaves_out_ the names of all files that contain units of text that say
caldera or sco. That is accomplished by giving egrep the '-L' parameter
instead of '-l'. I have to admit to not being able to explain why sco and
caldera are prefixed with the @ sign, but since his listing duplicates the list
by Darl & Co. I will have to ask you to take this technical detail on faith,
absurd as that may sound.

The last element in this pipeline '>>/tmp/output1' adds the output of
this pipeline onto the end of the file created by the first pipeline increasing
the size of it by that amount.

The third pipeline starts by using egrep again.

In this case the '-v' parameter is used with egrep. This serves to tell egrep
to select only lines that DON"T contain any of the instances of text in
the group that follows. i.e. 'alpha|parisc|sparc|sound|drivers' from the file
/tmp/output1 that was created and added to by the first two pipelines.

Then we see the pipe character which takes the output from the new run of egrep
and forwards it to the sort program. Sort takes it's input and sorts it
alphanumerically with punctuation and numbers first followed by normal
alphabetic sorting. The -u parameter means to only include one instance of each
filename ; if a name has been found twice, only include it once in the output.
The last element '> /tmp/SCOFiles-list2.output' sends the sorted and
uniqueified list to the file specified.

That should give us a list of files that SCO says "may or may not"
contain the information that IBM has wrongfully incorporated into Linux. Only
problem is IBM asked them for, as the law specifies, specifics in regard to
this, not a huge list of maybe-coulda-but-we're-not-sure files from some
version of Linux that they failed to specify.

The other pipelines that Frank lists use the cat program which he describes as
"cat is used to type out the contents of files, and is very similar to
type under DOS/Windows." Although this is a very good description I might
describe it more as "read in and then spew out again" rather than
type, but that is the way we understand the type program, so I am really just
splitting hairs here in hope that it will make better sense to non-technical
readers. Since the pipelines use commands and parameters that have been covered
above I trust that you will be able to work through them conceptually at this
point even if you need to refer back to references in the above text to unify
the concepts into a complete process.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Beyonder on Thursday, November 20 2003 @ 09:58 AM EST
I just had a thought about this, and maybe a way to help describe this to the
casual reader so they can better understand this situation-

its a bit like SCO saying to IBM: "All of our property is listed in these
specific volumes of the Encyclopedia Britannica (tm) and its up to you to find
it, well, actually, only some of the information in these is ours, and maybe not
all of it, and perhaps not at all"

literally its quite a useful analogy. Throw a bunch of paper on someone and say,
"well, it's in there somewhere" and leave it at that. and stuff
like "oh, well, it's not our problem to find it, you violated our rights,
you should be able to identify it more quickly than we can"

and silly nonsense like: "how are we supposed to know exactly what you
stole from us, or used improperly or derived or submitted improperly ?!? ...
Only you would know that..."

"oh yes, and btw there may not be any violations in what we gave you at
all, we changed our minds, its not about copyright or usage infringements or
even trade secrets, its about contract violations..."

"no wait, we changed our minds again, its not about contract violations,
it has been and always will be about trade secrets filed with the copyright
office..."

"no wait, what we really meant to say is that it is and always was about
copyright violations, no really, we mean it this time, specifically ones filed
with the US patent office... yes, really, we mean it this time..."

oy vey!

[ Reply to This | # ]

  • wait a minute ... - Authored by: Anonymous on Thursday, November 20 2003 @ 04:58 PM EST
I wonder?
Authored by: Anonymous on Thursday, November 20 2003 @ 10:17 AM EST
Is it possible for IBM to use this stuff to backdoor an
attack on SCO?<BR>
SCO and McBride are lying and slandering horribly. I have
been hoping that IBM can use all this stuff to go after
the true backers of this (MS, Sun, and Canopy group).

[ Reply to This | # ]

  • I wonder? - Authored by: Anonymous on Thursday, November 20 2003 @ 10:40 AM EST
SEC; Pump -n- dump
Authored by: Anonymous on Thursday, November 20 2003 @ 10:33 AM EST
I am curious why SEC has not stepped in yet? SCO shows
that they have nothing of value here. The only thing this
cna be doing is stock and market manipulation. Yet, the
SEC is staying out of this? I wonder why?

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: kberrien on Thursday, November 20 2003 @ 10:50 AM EST
PJ, et all...

I like the thrust of all this. Ie, lets look at code (the meat of the matter)
and determine what we can. For a non-coder this is great for me, and its the
kind of stuff that will influence the REAL WORLD.

Has anyone gone beyond this. For example, take the entire kernel, its component
technologies, and list out any that could possibly be in contention. For
someone with overall knowledge this might not be too hard, at a general level of
examination. Basically, play a MIT Mathmatician and see what you find. For
instance....

2.5/2.6 subtract all portions in 2.2 (assuming as said, SCO does not claim
anything in 2.2 right?).

-take the remainder.

Ok, stuff not in Sys V, USB right? remove that, etc. .. etc...

At the end, there should be various sections that COULD be in contention. Here
is where you do your detailed research. I'd be interested in, how many lines
of code is left, what % was provided by which parties. Remove all comment
lines, etc...

At that point, you could make some general or certain declarations which could
have some influence. You could then dispell the myth of how hard it would be to
re-code, etc.

Imagine a detailed study like this written up. Imagine if you could say that
97% CAN'T be SCO code. WOW. Or the total lines possibly at stake should only
total 200,000 lines of code, etc... Or, IBM only donated 0.98 % of linux.
People can understand this. They understood proof that the SCOForum code was
bogus.

Headline: Open Source Study Shows 97% of Linux not Sys V.

Of course, this does not take into account the derivative works nonsense... but
I don't think anyone takes that too seriously.

[ Reply to This | # ]

Next community project.
Authored by: RabidChipmunk on Thursday, November 20 2003 @ 11:03 AM EST
So now we need to compile a list of the obviously spurious files in the list.
Things like "I rewrote this part and deleted the section writen by
john@ibm."

[ Reply to This | # ]

Improvements
Authored by: Newsome on Thursday, November 20 2003 @ 12:05 PM EST

Based on some of the comments here, I've been able to improve the list 2 script further. The process I described earlier works, but (as I said) isn't minimal or the best. It was mainly the result of a number of iterations trying to get something that produced the right output.

Here is an improved version of the list 2 stuff:
egrep -wilr --include "*.[ch]" 'smp|rcu|numa' * > /tmp/output1

find fs/jfs -type f -path "*.[ch]" >> /tmp/output1

egrep -v 'alpha|parisc|sparc|sound|drivers' /tmp/output1 \\ | sort -u > /tmp/SCOFiles-list2.output

I'm sure this still isn't perfect, but it should be an improvement on the previous version.

In addition to "Never tangle with a geek when source code is on the line," one also wonders how SCO forgot "Never get involved in an IP war with IBM."

---
Frank Sorenson

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 12:05 PM EST
I don't think the ROUSs really exist. They are a figment, like innovation in the Open Source world. Everyone knows that it takes a corporation to buy innovate.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 12:19 PM EST
You know, there is a tie-in between pump and dump, and never having done the
code analysis.

Darl said to the world "we did a detailed code analysis and found all this
code, and that means we are going to get rich from suing IBM and selling
lisences to users," and that sent SCO's stock price soaring.

But if they never actually did the code analysis, then Darl lied to boost the
stock price, and that is exactly the sort of thing over which the SEC can nail
you to the wall.

[ Reply to This | # ]

Better ways to do this exist
Authored by: Anonymous on Thursday, November 20 2003 @ 01:48 PM EST
The find/grep/xargs pipelines are an obvious first step in this sort of
analysis, but once you've done that and found damning evidence, better ways to
analyse similarities in large code bases exist. Brenda Baker's
"dup" and "pdiff" would be very helpful.

See:
http://cm.bell-labs.com/who/bsb/research.htm
http://cm.bell-labs.com/who/bsb/papers/wcre95.ps

Unfortunately, I'm not aware of an open source version of these two things.
SCO, in possession of the System V copyrights, might be able to weasel them out
of Bell Labs, maybe.

This would be a better way to find similarities than ESR's
"comparator" I'd think.

[ Reply to This | # ]

Does IBM read Groklaw??
Authored by: Anonymous on Thursday, November 20 2003 @ 02:35 PM EST
This seems to be a very powerful piece, especially when combined with the
reader's comment contributions...

Show that the discovery list produced by SCO was a grep hack and you pretty much
confirm IBM's claims that SCO is blowing them off in discovery. SCO might claim
that they did a detailed discovery with "MIT mathmaticians," but if
the list can be reproduced simply, then Occam's razor pretty much nukes SCO's
claim, especially with the laughable examples like spinlock and the reiserfs
header.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 02:38 PM EST
This article really helps to illuminate the type and sophistication of analysis
that was likely done by SCO. Basically, there wasn't much and it was very
simple.


Non programmers need to understand that Unix/Linux contains an impressive
collection of tools for dealing with program source. Further, all Unix/Linux
programmers learn to use these tools at the same time they are learning to
program. Its like learning the alphabet and grammar when learning English,
French, ...


None of the analysis in this article, or the work done by SCO, is out of the
ordianry. We do this kind of analysis and more on a regular basis as part of
developing programs.

[ Reply to This | # ]

PLANNED
Authored by: brenda banks on Thursday, November 20 2003 @ 03:48 PM EST
it is planned down to the smallest detail
i would not be surprised to see that sco delieverd some papers to IBM today
not everything but enough so they can say they are trying
in fact they will make sure to point out how many pages vs the amount IBM has
delieverd
tomorrow will be interesting
is anyone going to be there that can fill us in or will this be private?


---
br3n

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 04:56 PM EST
Probably not. But I don't think that Darl, Blake, and Chris are too concerned
with their investors or their legal team. Why do you think they shelled out a
bunch of inflated stock to Boies in lieu of cash. Or maybe, with the ace that
they (probably) have up their sleeve that they will play during the trial, the
hope is the stock will jump up again and Baystar and Boies will make some money
then.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 06:41 PM EST
It's interesting to note that all of this analysis was done with extremely
basic UNIX tools - find, grep and so forth. Under the covers, though, is the
real power - regular expressions.

Regular expressions are formulas used to describe text, mainly for purposes of
searching, replacing, pattern-matching and transformation. The syntax is precise
to the point of being cryptic, and far beyond the scope of this websie (check
out http://www.oreilly.com/catalog/regex2/ for an excellent book on the subject
- you can find a good online for Perl regular expressions at
http://www.perldoc.com/perl5.6/pod/perlre.html)

Just to illustrate its basic syntax, a regular expression uses periods to mean
"match any character" So for example, the regular expression
".oo.e" would match the words "goose",
"loose", the "boose" in caboose, the names
"boone" and "moore" and so forth.

Similarly, the special characters ^ and $ are used to anchor text to the
beginning and the end of a line, respectively. So "^Linus" will
match the word "Linux" ONLY IF that word is at the beginning of a
line. Similarly, "Torvalds$" will match instances of
"Torvalds" ONLY IF they are at the end of a line.

grep and egrep have basic-to-intermediate regular expression capabilities, and
you can see how effective they are. But there are languages, such as Perl, whose
regular expression capabilities are far more powerful and sophisticated than
those found in basic UNIX utilities.

To get an idea how regular expressions can be exploited to their fullest extent,
check out Eric Raymonds "Comparator" program at
http://catb.org/~esr/comparator/. It can search two source code trees for code
matches down to an arbitrary granularity (by default, three identical lines). He
authored it with assistance from Ron Rivest, who is a REAL MIT mathematician
(and inventor of the RSA public-key cryptosystem, which most of you probably use
everyday when using a "secured" web site via HTTPS and SSL.

Regards,

Z

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 07:15 PM EST
What is scary that this whole thing seems to be very carefully planned. All the smoke and mirrors in the public statements are used to confuse the careful descriptions in the court filings. Maybe Darl isn't blurting out, but intentionally following the script to obfusticate the issues at hand.

What do we know, IBM is being sued over 'misappropriation of trade secrets'. IBM made press statements that they would help improve Linux support in some areas which would make Linux more valuable for their customers on their high-end mainframe hardware. SCO doesn't like this and claims that these are trade-secret technologies which according to the contract should be held secret, and they add the even wider claim that IBM should not be allowed to contribute anything to Linux. So anything that mentions SMP/NUMA/RCU/JFS may contain propriatary information. Anything that mentions IBM must be containing propriatary information. So really there is no copyright issue. And more useful examples here would be that none of these 'trade secret technologies' are really all that confidential or proprietary to begin with.

SMP 'technology' is not any different from user space threading. Just look at Threads Primer: A Guide to Multithreaded Programming' and you'll find a useful book containing anything you need to know about how to use multi-threading. The issues at stake, concurrency, locking, race conditions are similar. It actually even has a chapter about 'Operating System Issues', with a section about 'Solaris Symmetric Multiprocessing' [Solaris SMP].

RCU, a quick search for 'RCU paper' tells me that there already was a published paper about it in '98. Clearly Linux developers didn't really need IBM to implement this trade secret technology. It very well could have been implemented earlier and as I remember was discussed until someone pointed out IBM's RCU-patents and the idea was dropped.

NUMA, interesting, but one of those really expensive technologies that (until recently?) never had much of a market outside of research labs, see this computer world article in '98 There has been a tremendous amount published on the subject ('91 paper). Check those citations, well known university textbooks such as 'Operating Systems Design and Implementation' from Tanenbaum ('87) deal with the subject of NUMA in enough detail that it is reference by a paper. (incidentally this is the same book that contains the complete source code of Minix). So 2nd year computer science students since at least '87 and probably earlier have been discussing this subject in class, combined that with the research interest by people working in distributed systems doesn't leave much as a 'proprietary trade secret' 16 years down the line.

JFS, if IBM announces it is a port of the the OS/2 Warp version of JFS in their initial announcement, they probably were aware of the need for a clean-room implementation. Otherwise they would have mentioned AIX at least somewhere to make the announcement look more impressive.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Anonymous on Thursday, November 20 2003 @ 08:49 PM EST
A better simile would be,

This is like SCO searching 28 volumes of a encyclopedia set for the
occurance of the words "computer" and "technology". Then
saying, these
pages infringe our copright because they contain the words
"computers"
and "technology" but we wont say why or what infringes our
copryright
even though some of the pages are just editorial notes.

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: mac586 on Thursday, November 20 2003 @ 10:57 PM EST
Franks analysis of the file listing has always intrigued me. While I can no longer refer to myself as a programmer (completely lost here without a stack of O'Reilly books and an entire uninterrupted weekend to hammer out a nifty bash script), I decided to grab a small section of the the SCO List and take a closer look at who holds the copyrights on the NUMA files.

First, I tweaked Franks stuff to isolate the NUMA files found in the list prepared by SCO. I set up the 2.5.69linux source in my home directory and ran the following script:

find . -type f -name "*.[ch]" -print0
| xargs -0 egrep -wil 'numa'
| cut -c 3- > output1

egrep -v 'alpha|parisc|sparc|sound|drivers' output1
| sort -u > IBM-NUMA-list2.output

You can see that I stripped out the smp, rcu, and jfs stuff, and my output file is in a different location. Other than that, the logic is still the same. The result was 39 files, a simple subset of the SCO list focused on NUMA. Since the list was so short, I figured I would just peruse the code and document any copyright or maintainence data I found in the files.

Now if Darl McBride or Chris Sontag were reading this, they would expect to find all 39 files showing copyright ownership by IBM, since they generated this listing for the IBM Litigation. Being an educated Groklaw reader, you would have your doubts.

How about 2 IBM copyrights out of 39.

Then there is a third shared file by SGI and IBM. SGI provided the initial 64 bit code, and then IBM modified the file for the PPC64. (At first glance you could draw the conclusion that most of the code in the list was created by HW vendors and CPU manufacturers working with 64 bit chips... Intel, HP, SGI, NEC, ARM, and IBM.)

I need the Groklaw community to comb through this to verify my results. This should be treated as a draft, and I have yet to validate the results of my query based upon Frank's code. I think it is worthwhile, since it is the logical step following a simple grep of the code base to initiate the origins of the Linux code before claiming any infringement by IBM in a lawsuit.

Here is the copyright data I stripped out of the files. Please note that some of the files did not capture copyright notices, so I need to check out who has been submitting changes via CVS (yet another exercise). Also, the last item in the list actually captures a book publication and a public presentation as the original source:

1 arch/i386/mm/discontig.c -- IBM
2 arch/i386/pci/common.c --
3 arch/i386/pci/numa.c --
4 arch/ia64/kernel/acpi.c -- "VaLinux, Intel, HP, NEC"
5 arch/ia64/kernel/ia64_ksyms.c --
6 arch/ia64/kernel/smpboot.c -- "HP & Intel"
7 arch/ia64/mm/numa.c -- NEC
8 arch/ia64/sn/io/alenlist.c -- SGI
9 arch/mips64/sgi-ip27/ip27-memory.c -- SGI
10 arch/ppc64/kernel/prom.c -- "Paul Mackerras 1996, mod by IBM for 64 bits"
11 arch/ppc64/mm/numa.c -- IBM
12 arch/x86_64/kernel/e820.c --
13 arch/x86_64/kernel/head64.c -- Suse
14 arch/x86_64/mm/k8topology.c -- Suse
15 arch/x86_64/mm/numa.c -- Suse
16 arch/x86_64/pci/common.c -- "Martin Mares "
17 include/asm-arm/arch-clps711x/memory.h -- ARM
18 include/asm-arm/arch-sa1100/memory.h -- "Nicolas Pitre "
19 include/asm-i386/mach-numaq/mach_apic.h --
20 include/asm-i386/mach-numaq/mach_mpparse.h --
21 include/asm-i386/mmzone.h -- IBM
22 include/asm-i386/mpspec.h --
23 include/asm-ia64/acpi.h -- "VA Linux & Intel"
24 include/asm-ia64/mmzone.h -- "SGI & NEC"
25 include/asm-ia64/nodedata.h -- "SGI & NEC"
26 include/asm-ia64/numa.h -- NEC
27 include/asm-ia64/sn/nodepda.h -- SGI
28 include/asm-ia64/sn/pda.h -- SGI
29 include/asm-ia64/sn/types.h -- SGI
30 include/asm-ia64/topology.h -- NEC
31 include/asm-mips64/mmzone.h -- SGI
32 include/asm-mips64/processor.h -- "Waldorf GMBH, SGI, Paul M. Antoine"
33 include/asm-mips64/sn/types.h -- SGI
34 include/asm-ppc64/mmzone.h -- "SGI, IBM PPC64 port"
35 include/asm-x86_64/e820.h --
36 include/asm-x86_64/mmzone.h -- Suse
37 include/linux/mmzone.h --
38 kernel/sched.c -- "Linux Torvalds"
39 mm/slab.c -- "Mark Hemment, Manfred Spraul, An implementation of the Slab Allocator as described in outline in; 15 * UNIX Internals: The New Frontiers by Uresh Vahalia 16 * Pub: Prentice Hall ISBN 0-13-101908-2"

[ Reply to This | # ]

Did SCO Really Reveal the Code to IBM, as Darl Claims?
Authored by: Steve Martin on Tuesday, November 25 2003 @ 09:18 PM EST
I don't know if this has been mentioned before, but just in case it hasn't,
one more bit of analysis can be done from List 2: a complete count of all the
lines in all those files (including blank lines) comes to a grand total of
338,337 lines in all 591 files (based on the 2.5.69 kernel). Whither
"millions of lines"??

[ Reply to This | # ]

SCO's "lists" shows contempt for judges
Authored by: Anonymous on Friday, March 19 2004 @ 08:57 AM EST
It is obvious that SCO hasn't spent much time or talent in generating lists of
'infringing' code. That they would tender such weak offerings indicates that
they have little regard for the technical abilities of the judges or for the
judges ability to seek competent help in determining if the files fulfill the
courts demands.

The lists do support the contention that SCO's suite is merely an attempt to
retard the growth of the adoption of Linux.

[ Reply to This | # ]

Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )