decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books

Gear

Groklaw Gear

Click here to send an email to the editor of this weblog.


You won't find me on Facebook


Donate

Donate Paypal


No Legal Advice

The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers.

Here's Groklaw's comments policy.


What's New

STORIES
No new stories

COMMENTS last 48 hrs
No new comments


Sponsors

Hosting:
hosted by ibiblio

On servers donated to ibiblio by AMD.

Webmaster
Data Mining, Spectral Analysis, and All that Jazz
Friday, August 29 2003 @ 01:45 AM EDT

We're not born knowing what spectral analysis is. So when SCO said that spectral analysis is one of the methods they used to find "infringing" code, I had no idea what they were talking about. When Sontag compared it to finding a needle in a pile of needles, I figured it wasn't much use. And it turns out, on further investigation, that my intuitive conclusion may be about right, at least when it comes to using it for software code data mining for infringing code in this case.

An alert reader noticed something interesting. One of Canopy Group's companies is called DataCrystal. Could that be at least one of the three groups SCO hired to try to sort through the code of UNIX System V and the Linux kernel? DataCrystal does "advanced pattern recognition" and "AI systems". They actually claim to do a great deal more besides. One of the things listed on their what-we-do page is data mining. Presumably, that's what SCO wanted to do. And a look at their About page indicates that if you are the RIAA, you probably would want to have a company like DataCrystal to hunt down pirates for you. Another reader noticed this page about a DataCrystal, and he wondered if it might be the same company. It isn't, because this DataCrystal is the name of a project at USC, not a company. While I don't know if the Canopy Group company DataCrystal was hired by SCO, or whether there is any connection between the company and the project, it did make me start to wonder about the field in general. If you really wanted to know if two piles of code had identical or similar code, can data mining find out? And would matches be reliable for use in the way SCO apparently is using them? Judging by the SCOForum demo, we might think no. And we might be right.

I asked a Groklaw resource person, a man who worked for over a decade doing basic and exploratory research for the US DoD and the Canadian Ministry of Defence on topics related to secure communications and signals intelligence, including cryptology, statistical processing of natural language, signal processing, and computational learning, if he'd be willing to explain it in general and understandable terms, so we can follow along. Very likely this subject is going to be a very significant part of the case when it goes to trial.

Here is what he explained to me:

Data mining is looking for patterns or similarities in large quantities of information. Google is a good example of data mining-on-demand where the pattern is supplied by the user and the large quantity of information is the entire set of webpages on the internet. But data mining in general is potentially much broader. For example, a typical data-mining task might be to take a sample document and look for other documents in a database that might be similar to it. But even beyond that, data mining can be applied to other kinds of data -- pictures, for example, or sound recordings.

There are lots of different ways to approach problems like this. Beyond the most elementary, what all the techniques have in common is that they rely on mathematical models and transformations of the data. Part of the reason is efficiency, since turning the problem into math usually means there's a computationally clever way to do it. Another part of the reason is that, by transforming the problem into math, you make it possible to find and grade a continuum of approximate matches -- in short, to find ranges of similarities rather than just identities. Note very well that 'similarity' here is completely dependent on the particular flavor of math you've chosen as your technique. This is extremely important.

OK, so you've taken your document or picture or whatever, and you've mined your database for similar items. Those items will be graded for similarity to your original, just as some search engines will rate their returned items in terms of probable pertinence. The most sophisticated and respectable data-mining systems will be using grades based on probabilities. This is because the underlying math will be using probability models. Many times the grade will reflect not merely the strength of a match in terms of probability, but also the likelihood that such a match would be found at random searching any old data at all. This also is extremely important, since 'any old data at all' can be subject to a wide range of interpretations. (This could pertinent in the SCO case, since, if data-mining techniques are used, it's a reasonable question whether any contamination discovered this way is real, or whether it's spurious, i.e., capable of being found to the same degree in other, unrelated data.)

Now the DataCrystal webpage consists mostly of a laundry list of any and all of the subjects ever associated with data mining, artificial intelligence, knowledge discovery, or machine learning. But the .pdf white papers all focus on using data-mining techniques for indexing and retrieving digital video and audio. What's more, they're offering not just indexing and retrieval services, but also housing, protecting, and distributing the data itself.

It outlines an enhanced technique for expanding data-mining coverage. It's a technique for building patterns out of patterns and data mining on the derived metapatterns in turn.

Not being a rocket scientist, I wanted to be sure I'd understood, so I wrote back and asked these followup questions, and got this reply:
Q: I have two questions to follow up:
1. . .the results would depend on how you programmed the software? In other words...it can look for similarities, but it can't evaluate them?

ANS: Absolutely correct.

Q:..there might be in actuality no common code at all?

ANS: You know how Google sometimes matches all the words in your query, but not necessarily conjointly or in the same order?

In the case of computer code, especially code written in C expressing similar or common algorithms, it would be astounding if there weren't pattern similarities at some level. If nothing else such things are enforced by the design of the language and commonly-held notions about good coding style.

Q: ...it simply would have to be the case that some of the code is close enough that they might have a case?

ANS: Just the contrary. As with the 1st slide example, the ancestry of that memory-management code is known to virtually anybody who's studied C from Kernighan and Ritchie's book. A similarity like that would stand out like Devil's Tower, but what it indicates is exactly the opposite of what they contend: it shows that everybody knows the pattern.

Q: And can they program the math to increase "matches"? Pls. explain a bit more this part.

ANS: Here's an example. Suppose you came up with a hitherto-unknown page of blank verse. The question is, was it written by Shakespeare or not?

Data mining your way through that problem, you'd get one level of certainty if your database contained the Bible, Goethe, Racine, Pushkin, and the New York Times. You'd get a different level of certainty if your database were confined to Elizabethan dramatists. The scores for putative Shakespeare against the mixed database would probably be huge just for matching any English. The scores against Elizabethan dramatists would probably be quite a bit weaker, but clearly more conclusive. The mixed-database test -- the one with the Bible, Goethe, etc. -- will probably say 'Shakespeare indeed!' but it's expressing the idea that 'if it's English it's Shakespeare.'

On the other hand, the Elizabethan dramatist test might say yes, might say no, but the answer will be based on such things as a small number of very subtle differences between, say, Shakespeare's and Marlowe's vocabulary. It expresses perhaps the idea that 'in any 1000-line chunk of Shakespeare and any 1000-line chunk of Marlowe, Shakespeare is likely to use the word 'ope' once and Marlowe not at all. This example doesn't use 'ope' at all therefore it's probably Marlowe.

You can see it's still a matter of interpretation and probability, but the second test is simply more credible on grounds that are external to the data-mining method itself.

Here's another point of view. How does a data-mining search for SVr4 code look if you run it against all C programs? In all likelihood you're going to find some matches. Are the matches against Linux actually any stronger than matches against an arbitrary body of C code? Against other Unix-like kernels? etc.

These are interpretive issues, but there are statistical grounds for deciding them, and speaking strictly for myself, I seriously doubt they've been fielded satisfactorily. For my money you couldn't even start taking the matter seriously unless exactly the same tests were run against every body of other kernel code like all the BSDs, and a chunk of the SVr4 kernel against the rest of that same kernel. And even then, you've only generated the raw information to start the business of verifying and refining the procedure.

Q: Also, what is spectral analysis? Is that what this is?

No. In general, spectral analysis refers to breaking things down into component frequencies -- sort of like how a prism breaks white light into colors, and so on.

In this case it refers to using the periodicities of the individual characters of program text as frequencies to look for a very specific set of 'colors' associated with a particular swatch of program code. It's not determinative either. It may also refer to a kind of computational trick using spectral-based techniques to look for certain kinds of approximate matches very quickly.

So, there you have it. At least now we know in general what they are talking about. As the case goes forward, and more is revealed, no doubt it'll be interesting to meaningfully follow along.

His analogy to Google made it all come clearer to me. On top of all that he wrote, I know with Google, input affects output. And input means humans, imperfect humans. I certainly know that I get different results from Google if I plug in the identical search terms, but in a different order, for example. So I totally get how results could be skewed by what you tell the software to do. For example, I get different results if I search for "Dave Farber" and IP than I do if I search for IP and "Dave Farber", and it's different still if I search for just IP or just "Dave Farber" or just Farber or just Farber and IP. And that's using the same pile of data. Input affects output.

Obviously they would argue that their methods are so refined, blah blah. But that human element can't be removed, because humans write the software, no matter how sophisticated. So how reliable are the matches? You use Google. What do you think? Doesn't a human at some point have to interpret the value of the results?

"A continuum of approximate matches" does not infringement prove, on its face. As he says, it's an interpretive issue. And data mining seems to be a better match with something like matching amino acid strings than figuring out if someone stole somebody's code, which requires knowing who has or doesn't have a valid copyright, which way matching code travelled, who had the code first, etc.

If I've understood what my friend has written, it means that if SCO swapped out Linux and searched Windows 2000 code instead, it'd likely find instances that looked like "infringing code" also. That's the same as saying that so far, they are holding maybe nothing. It all reinforces in my mind that, once again, nothing has been proven to date by their claims of similarity, derivative or obfuscated code matches, and nothing can be proven using data mining techniques, until this case goes to trial and the experts speak, followed by a decision by a judge.

If you are interested, here is a white paper, "Text Mining -- Automated Analysis of Natural Language Texts" that explains the process of searching just for simple text, and while it does the explaining, it also shows just how much human input goes into structuring your search before you begin the search and why the results still may not be what you want. It is hard to see how such techniques could answer the question: "Is this infringing code?" At best, it could show you where to begin to investigate. And here are the DataCrystal project's white papers.

Oh, one other thing I found out in my investigation. Guess where most of the cutting-edge brains working on such data-mining techniques work? . . . No, really. Guess. . .

That's right: at IBM.


  


Data Mining, Spectral Analysis, and All that Jazz | 95 comments | Create New Account
Comments belong to whoever posts them. Please notify us of inappropriate comments.
radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:17 AM EDT
SCO have been unusually quiet today, indeed most of the week

In the last comment's section, I can't see comments after Wesley Parish • 8/29/03; 2:26:37 AM. This appeared when there were about 145 or so comments (I think), so we're missing about 15. Recommend people post new comments in this topic instead!

Recommended reading for the day

http://www.threenorth.com/sco/co hen.html

http://www.pclinuxonline.com/modules.php?name=Forums&file=viewtop ic&topic=2033&forum=46

http://www.pclinuxonline.com/modules.php?name=Forums&file=viewtop ic&topic=2027&forum=46

On the was it or wasn't it a DoS? Somebody posted a link to a weekly view of uptime at biz.yahoo.com (can't find the link right now) - it really was regular as clockwork.

Incidentally SCO migrated off SCO Unix to Linux in 2002. Check it out! Maybe customers should follow their example.

http://uptime.netcraft.com/up/graph/?site=sco.co m&mode_u=on&mode_w=on&avg_days=30&submit=Redisplay+Graph


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:22 AM EDT
More reading

http://action.eff .org/action/index.asp?step=2&item=2775

Oh, yes here is the pattern for the alleged DoS. Look how regular. Surely looks to me like they're just turning it off during non-business hours?

http://uptime.netcraft.com/up/performance?explain=0&mode_p=on&m ode_u=off&mode_w=off&by=collector&errors=0&site=www.sco.com&site1=&sample=2&subm it=Examine&range=5d&maxy=0


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:25 AM EDT
If the glove don't fit you must _______? I see how the logic above fits with the above information and I see how it all connects. Yep, I really do... however, how many people from an eligible jury pool will begin to understand this? Hmmm, or will they go for the raw power brought to a court room by an appearance of the celebrity lawyer (Boies). Hmmm, if a jury is involved then a simple phrase that catches the ears of the jury just right might win the case over all of this expert IP data! If the glove don't fit, then Unix is owned by SCO. Hmmm, the last I heard OJ is free and still playing golf in Florida.

SO as an insurance poilcy against a SCO court room victory in Utah (Microsoft, and maybe SUN, will make sure that they have enough money to pay for the years of lawyers bills), think seriously about making a complaint/filing, concerning SCO, by advising your state attorney general about SCO's FUD and or actual abuse of you as a innocent 3rd party consumer by attempting, as posted on their web site and public interviews, to threaten you, as a paid up user of LINUX, into paying again and again for your use of your already paid for LINUX. And while filing your concerns with the state AG don't forget to document the date(s) of your previous and current LINUX acquisitions and/or downloads, as evidence of your date(s) of possession and use, today.

Why should I maybe elect to contact my AG and document my LINUX today?

Becasue, THIS IS A DO or DIE period of time for SCO UNIX and for SCO/Canopy group! Any bet that they will not sue users?

Here is one point that most are not considering when planning their LINUX roadmap into the future!

1. In order for SCO to prevent the state AGs from protecting users (via the laws of agency), SCO first has to go thru a legal proceedure where they notify absolutly everyone in the world that, the current LINUX agents are not acting with SCO's IP authority - and that these agents can not sell LINUX under the terms of the perpetual use GNU GPL anymore (this perpetual use understanding overflows to mean that all future upgrades are perpetual as well)! SCO has not made moves to make this notification legal yet! They will make this notification - they have to (if they don't they will not be able to collect any money from any LINUX user = their goal)! So, after SCO does make legal notification to the public, then after that date, then any LINUX acquistion or download that happens after that date could be seen by the courts as being after the fact (after SCO has made the notification to users legal, as seen by the court). Being seen as getting your LINUX after this SCO notification date may put you in the way of SCO's harm.

The attorney general's office can document your complaint about SCO and also document a date(s) when, previous to SCO legal notification to stop actions of agents, YOU had legal possession of the LINUX IP product (where then you are then protected as the terms and conditions of your acquistion of LINUX then predates any SCO action..., So, your rights are truely perpetual... and however the IBM suit, the Red Hat suit or any SCO vs LINUX user suits go... the laws of agency and your attorney general should protect you)!

Contacting and documenting your LINUX possession status at the AG's office is not rabid dog crazy, but maybe crazy like a fox (a simple prudent move that one can make to cover the bases now rather than later). Again, Please remember, the last I knew... a "jury of our IP knowlegable peers" will be hearing the case in UTAH and a certain someone is still living freely in Florida!


annon

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:29 AM EDT
Final corrected version of the article. Like quatermass, I can't read end of comments on previous article.

HEY, LINUX! WORK HARD AND YOU WILL BE SCO'D

The "new SCO" (formerly Caldera) is not the only member of the Canopy group (http://www.canopy.com) still distributing the Linux kernel. Canopy portfolio member Linux Networx, depite the name incorporating alleged software pirate Linus Torvald's notorious trademark, has distributed the latest Linux kernel but one, even after SCO made its allegations, aggregates the kernel with the work of others, and continues distributing the source and binaries for that derived work today.

Now you might think that as a member of the Canopy portfolio, Linux Networx would respect the claimed intellectual property of another Canopy Group member. However, not even the Canopy group itself takes SCO/Caldera's claims seriously. As recently as May 1st, 2003, Linux Networx was uploading Linux source and binaries to the FTP site, ftp://ftp.lnxi.com. Linux is still distributed from that venue today. Their customers are referred to this site in the white paper for their Linux BIOS product,

http://www .linuxnetworx.com/products/linuxbios_white_paper.pdf

There, under the linuxbios directory (ftp://ftp.lnxi.com/pub/linuxbios/ kernel) we find

linux-2.4.20.tgz 5/1/2003 3:24:00 PM

Suprisingly, however, this file is not actually the Linux 2.4.20 kernel.

4ef3a43d8fa4d8166a8bdcadd4285f80 *linux-2.4.20.tgz

It turns out to be based on linux-2.4.20.tar.gz, a pristine kernel as downloaded from the kernel.org distribution site, with two patches applied. Both patches are included in the toplevel directory of the new aggregate distribution being distributed by Linux Networx. They are:

patch-2.4.21-pre4 and

linux-2.4.21-pre4.mtd-thayne_rc1.patch Both patches seem to be commonly available on the net. In addition, it contains the vmlinux binary and many build artifacts, mostly ".o" files. Linux 2.4.21 pre4 puts this kernel on the development branch immediately preceding today's stable Linux kernel, 2.4.22. In addition, by aggregating this allegedly infringing kernel with two other derivative works, created by others, Linux Networx is itself creating and distributing a derivative work, both binary and source. They are doing this, however, without any notation of the changes they have made in so doing as required by the GPL--though it is easy enough to infer from the included patch files. They are (1) calling their aggregate distribution Linux, and (2) distributing it under the same version as a commonly available Linux kernel.

Now, naming and distributing a Linux 2.4.21pre4 kernel with the title "2.4.20" is a bit sloppy. It also violates the GPL provision that your changes must be noted and clearly labelled. So, in addition to using Linus Torvald's trademark, violating the GPL, *and* tresspassing egregiously on SCO's alleged copyright claims, all at the same time, Canopy group members are distributing falsely labelled kernels.

As a respected and active member of the Linux community, the Canopy group should disavow all association with SCO's actions. Or, if they prefer not to be respected, they should unlink alleged software pirates like TrollTech and Linux Networx from their own homepage. Or maybe they should just go and f^Hsue themselves.


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:34 AM EDT
The threat is back, or is it? So many double negatives.

I can't imagine this being anything other an attempt to parry Red Hat

http://www.theinquirer.net/?art icle=11273

Blake Stowell, director of public relations at SCO, told the INQUIRER late today: "Just because we aren’t “planning” to sue Linux companies doesn’t mean we won’t. We tried to avoid suing Red Hat, but they seemed to bring the litigation upon us, not us upon them. Also, just because we are saying that we won’t sue Linux companies doesn’t mean that we won’t sue Linux customers".


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:38 AM EDT
Oh, and has the SCO warning to Linux customers disappeared from their site or not? I can't find it now.
quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 11:44 AM EDT
Everybody in the open source movement is angry at SCO, and SCO seems to rely
heavily on GPL'd code in their commercial Unix products (the SCO FAQ
specifically suggests using gcc as a compiler, for example, and Samba is a very
necessary part of their new releasees). Just had me wondering, and BTW, this is
just pure conjecture- what would happen if the various open source projects
decided to modify the GPL for new releases- make it illegal for open source
software to be used in any commercial Unix release? I know- sort of against the
spirit of open source, but SCO is begging for something like this. Could
something like this ever happen?
wild bill

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:02 PM EDT
More sexy than Lawrence Livermore, LinuxNetworx render farm is behind movie "The Core". I wonder if SCO 0wnz Hollywood now.

http:// www.linuxnetworx.com/news/4.2.2003.32-Linux_Networx_R.html


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:13 PM EDT
It doesn't really matter. Let's see, I could build a computer like HAL, with terabytes of memory, employ teams of specialists, scientists, engineers; to scour the Linux kernel for similarities to anything in existance. And, you know what? There probably would be similarities -to a degree.

No number of fancy algorithms, or MIT experts, prove beyond a doubt that code was stolen specifically from SCO and put into Linux. It used to be that they claimed this had been done by IBM, but when that fissled out, they expanded it to "Linux" and the dark forces surrounding it. Similarities or not, they have to PROVE their claims -and that would be extremely hard to do.


Stephen Henry

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:23 PM EDT
PJ, again another great article, please keep up the good work. I completely
understand and agree with the sections referring to common algorithms and
similar coding structures. It can't be avoided, you're limited by education and
hardware. Comments are a different story, especially lengthy ones, but I'd even
expect to see some similar, and possibly identical, one-liners come from
multiple independant programmers.
Tazer

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:26 PM EDT
Wild Bill,

"what would happen if the various open source projects decided to modify the GPL for new releases- make it illegal for open source software to be used in any commercial Unix release"

As I understand it, the FSF is the only one that can modify the GPL. And, as you note, restrictive cluses are very much avoided by the GPL authors.


Tom Cranbrook

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:30 PM EDT
I think the picture is pretty clear - you can build a fantastic method for detecting similarities between SCO-owned code and Linux but

a) your machine's sensitivity to similarity is human-controlled

b) even if you find similarity it doesn't imply tainted code in Linux.

Perhaps PJ could comment on the proof required for copyright infringement to be upheld. I remember hearing a radio programme about it (UK law this would be) some time ago - they said that basically there were 3 defenses to copyright infringement cases

1) The similarity is not substantial (in this setting I guess "substantial" would mean not passing the abstraction-filtration-whateveritwas test)

2) I had never seen his (work of art) when I made mine, so mine was independent

3) We both got this idea from a common source (in a non-infringing way).

Is this right? It seems that both 1) and 3) are viable defences for whichever Unix licensee is supposed to have contributed the tainted code.


Dr Drake

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:33 PM EDT
I'd like to see something like, "License to use GPL software can be removed if the FSF finds that the user(s) in question is damaging the reputation of GPLed software, or attacking the GPL via their publicity efforts or the court system.

And could everyone PLEASE learn to give short URLs? The comments system does allow the use of HTML tags.


Alex Roston

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:37 PM EDT
To people asking about withdrawal of usage rights to GPL software from SCO -
it's a matter of principle. Free software is free, even for asshole companies to
use (although they'd better be a bit careful about redistribution if they don't
want to stick to the GPL). Even though we strongly disagree with SCO's actions,
we still grant them the right to use GPL software. We show ourselves to be
better than them by not lowering ourselves to our level.
Dr Drake

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:39 PM EDT
Great job, pj. Not only does IBM have the data mining talent, they have the SVR4 code. If they have the data mining talent, do they have data mining patents? How will data mining and spectral analysis play to a jury? About like DNA in the OJ trial.

I wonder if IBM has any good decompilers? Might be useful for determining if there is Linux code in SCO's LKP.


Greg T Hill

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:44 PM EDT
I would doubt that anything like data mining or 'spectral analysis' would play meaningful part in a trial. Data mining is certainly a way to find similarities in a large corpus of code. But the courts have long-standing tests and precedents to guide them to arriving at a determination of infringement. Such tests involve _reading_ the code. I think the case (if it ever happens) will hinge on the provenance of such code as may be considered infringing.

td


Thomas Downing

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:51 PM EDT
Data Mining is all fine and good, but it's the genealogy of the matches that helps make the proof...

When I did code reviews years ago, the key piece of information that was always required was the revision history. With it, our teams were able to resolve problems that could not be solved by other means - because we had the history on how all the moving parts were created, who did it, and how they did it.

SCO can employ all the gee-whiz tech they want, but at the end of the day, someone has to look at it and make a decision whether the match is worth anything. Sounds like SCO has not done their homework again.


Paul Penrod

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 12:53 PM EDT
I agree with td. SCO may claim to be using data mining and spectral analysis to
find the so-called infringing code, but what you would hear in a court room is a
description of the code itself. How they found the code is irrelevant. What
would matter is if they could find copied code that violated a contract. If so,
what difference would it make if they found that code by stumbling upon it, from
a tip, through spectral analysis, or the back of a cereal box. Once found, its
existence is the key, not how it was found.
Nick

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:07 PM EDT
I can see why they call it spectral analysis. Having worked with RF and SP off and on, the general idea is how close a match you get to something big. In radio thats like something transmits at 800MHz. Well it doesn't always transmit at exactly 800MHz so when you search for frequencies +/- around 800MHz you get the classic frequency matching curve which represents the amount of energy arround 800MHz.

The data mining technique must be similar to this. Ever google with a long string. Works very poorly. But if you do a spectral analysis approach on a chunk of code you will get code matches that are close. Like looking at a diff of two versions of code with just some changes. There could be percentage score associated with such a match based simply on amount not changed.

OK for getting started, hardly proof. Even if you had a 100% match there are 3 questions you would have to ask before you find the smoking gun. 1. where did the code come from FIRST? Looks like SCO just said AH HA a match from SysV is from us! 2. Is it general knowledge? An example is the malloc code. 3. Is it big enough to matter? A small chunk here or there is not enough.

Obviously this highlights a few things here.

1. SCO has been at this for YEARS. They didn't just have a fall out with IBM and went and sued them. 2. The BIG CONCERN at SCO is NOT Linux is general, but how Linux is moving into SCO's traditional market. Linux is ok as long as its at Universities, hobbiest, even in the server room. Once it shows up in multiprocess machines, clusters, 64 bit machines, etc. There is no market for SCO. At least that's what SCO thinks.

This explains the weird comments about "look how fast Linux has come" and the like. That's why they are after 2.4 not 2.2 or 2.0.

BUT THEY CAN'T SAY THIS!! Why? Because that market for high end multiprocessors does not = $3 Billion. SOOO sue everyone.

The old business model for this kind of stuff was charge the customer a LARGE up from fee, plus some 4 or 5 digit fee for yearly support. The customer pays because he can't develop it for cheaper and can't find the stuff anywhere else.

Now along come the same thing for FREE. (oh shit). SCO didn't adapt. They still think it works like VAX/ VMS or IBM AS400. Those days are very LONG gone.

This suit is a desperate attempt by SCO.


BubbaCode

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:14 PM EDT
Here's a URL that proves another Canopy company is using MySQL for its "culturegrams" product. I drilled into look at what they say about Finland, then broke my query to get the error message.

http://onlineedition.culturegrams.com/world/world_country.ph p?contid=5&wmn=Europe&cn=Finland


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:14 PM EDT
Wow, pj. This site is stunning proof that investigative journalism is no longer done at newspapers, but by people like you.

My own personal take on data mining and pattern analysis is this: while it may be a useful technique for _discovering_ copyright violations (especially when you don't really know what you're looking for), at this point I think it's fair to say that it's absolutely useless for _proving_ them. If such a tool spits out a "similarity score" or some such, what does that mean? Not much, in my opinion.

Perhaps the best analogy would be DNA sequence comparisons, which are now, after an extensive period of debate and scientific work, being admitted as evidence in trials. But the key here is that an expert can give a pretty good explanation about what DNA results mean. A judge or jury can use evidence such as "statistically, there is less than a one in a billion chance that the blood under the victim's fingernails came from a different individual than the suspect". That expert testimony would be backed up by real science, involving serious critical debate about what these probability estimates mean. I'm sure someone of pj's research skills will have no trouble digging up info on all this, but here's a quick URL I found anyway:

http://www.nap.edu/book s/0309053951/html/194.html

By contrast, the science of data mining is still in its infancy. Based on the way software is produced, you'd _expect_ to find statistical similarities in entirely separate codebases. For one, the basic algorithms are taught in textbooks that everybody reads (or should read). If you do find a statistical similarity, how can you separate out the truly creative contribution from the standard application of cookbook recipes? What's the noise level, in other words the chance that two random excerpts of code will trigger a "statistical similarity" check?

Even pattern analysis was able to perfectly identify whether copying took place, there are all the other relevant questions. Who copied from whom? Was there a legal right to do so? I think the Berkeley Packet Filter components of the SCOForum presentation is particularly telling here. Sontag's assertion that it was primarily a demo of the pattern analysis that they're doing is probably right on target.

I might be willing to accept SCO's pattern analysis evidence in one scenario: that they allow it to be used to analyze their proprietary code base to determine how much "copying" there has been from free software projects, and agree to make good any damages found in proportionate terms to the damages they're seeking from IBM. If their confidence in the tool is so high, as well as their confidence in their own lily-white processes for making sure there is no improper copying, then they should have no problem agreeing to this. I'll leave it to the groklaw community to estimate the likeliness of this happening :)

In a trial, I'm sure any half-competent lawyer with access to half-competent expert advice would be able to demolish evidence from data mining. I think IBM is a worst-case adversary for SCOX in this sense. But, again, all this seems to fit into the pattern of trying this case in the media rather than the courts. Gullible journalists, analysts, and so on, are quite likely to be taken in by a snow-job with fancy-schmancy scientific lingo. The best way to counter this is probably to continue insisting that SCOX makes clear, verifiable claims.


Raph Levien

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:15 PM EDT
I agree with Steve above. Basically you have to get the offending
Engineer/coder and get him to admit "I had SCO code when I wrote ______. I
copied the SCO code to make______. I used SCO code as a reference to write
______." If they don't have this their case is weak. Bet you lots of money
they have a team working this issue now.
BubbaCode

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:15 PM EDT
Dr Drake

IANAL, but defense 4) You are using under a license from the copyright holder.

If there is any SCO code in Linux, it's certainly going to be argued they licensed it's use themselves under the GPL.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:27 PM EDT
Sorry if this was posted before. Article on third party company to close to SCO in the same data center getting blasted by the DOS attack on SCO.

http://elette rs.eweek.com/zd/cts?d=79-180-2-3-22359-23149-1


BubbaCode

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:33 PM EDT
> Just had me wondering, and BTW, this is just pure conjecture- what would happen if the various open source projects decided to modify the GPL for new releases- make it illegal for open source software to be used in any commercial Unix release?

If all the copyright holders of a project agreed, they could relicense it will a GPL like license that excluded say OpenServer, Unixware, etc. I don't think they could use the GPL itself.

Of course, SCO could continue to use the older GPL releases.

Alternatively, the software authors could simply gradually remove SCO specific work-rounds, and write code that just happens to break on SCO's platforms.

Yes SCO could fix it, because it's open source, but (a) it's cost them money, and (b) over time SCO will get on a more and more deserted private side fork, missing all the critical security enhancements and bug fixes. If the software author doesn't mention these changes in their documentation, SCO could be in a a real pickle, just trying to figure out what is going on.

Check comments section on recent SCO stories at www.linux.com, and you'll see this idea is being somewhat discussed... I bet there are more quitely thinking about it, or discussing it on private forums or IRC.

Personally, I have a lot of sympathy with the idea. Why supply free code, free support, and free products to sell, to somebody who is gunning for you?


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:37 PM EDT
I would like to return for a moment to the subject of the company "DataCrystal." Anyone armed with even a cursory knowledge of the data mining field who visits the company web site -- as opposed to the web site of the academic research project with the Coincidentally I'm Sure Similar Name -- would dismiss this company out of hand. It's a non-serious entity. The list of claimed expertise is a dead give-away. There is no business reason why anyone would assemble a collection of people with such disparate skill sets. Not even a huge DOD contractor would lay claim to all those things. And certainly no one human being possesses expertise in more than one or two of the listed fields. When we find out that this business operates out of somebody's house, the claim becomes preposterous.

Then there is the matter of their "white papers" on the technology page. This is beam-me-up-Scotty stuff. What they are talking about is so far beyond the current state of the art that Somebody Big would have to spend many years and hundreds of millions of dollars to create the technology they are describing.

Not to put too fine a point on it, but absolutely nothing on this company's web site is consistent with an actual, real-world, data-mining or AI house. I recently spent three years in such a company; it was full of Ph.D. text-mining and neural-net gurus. I know where the state of the art is, and Data Crystal is not on the same planet.

This company has all the earmarks of being a Canopy Group shell... something kept around to be one of the things that the peas hide under as Canopy shuffles its assets around in some kind of game. I'm not sure it would have to be disclosed in the 10-Q, but as soon as SCO files it I'll be checking it to see if SCO has been making large payments to DataCrystal for Darl's "pattern matching" work.

Many people on the Yahoo SCOX booard are speculating that the plan here is to liquidate SCO, but to do so in as noisy a manner as possible so as to have one last opportunity to extract cash from the investing public. Thus the insider selling and the Vultus acquisition using newly-minted SCO shares. The next step would be getting as much of SCO's cash as possible out of the company before scuttling it... this would mean SCO "buying" services from fellow Canopy Group entities -- and paying in cash -- to the maximum extent practical. They just acquired another three months of delay before they have to produce any more expensive legal work in the Red Hat or IBM lawsuits, and that's probably it for them. By the time Canopy actually has to pay Boies & Co. for any serious legal work, they could have moved most of SCO's cash out of the company. The last step is to file a Chapter 7 bankruptcy, in which Canopy Group -- as the largest debtor -- inherits whatever assets remain, such as the UNIX IP.

To prove this, we would have to somehow document the "purchases" made by SCO during its noisy and expensive swan song. Presumably these could be obtained by subpoena, but probably not until after SCO has decalred bankruptcy, leaving aggrieved shareholders with some standing to sue (by alleging fraud). I checked, and DataCrystal itself appears to be a privately-held company with no reports on file anywhere. So there is no data to be had that way.


Bob

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:38 PM EDT
Bubba, did you look at the uptime graph?

This alleged hacker seems to keep very very very regular hours?

http://uptime.netcraft.com/up/performance?explain=0&mode_p=on&m ode_u=off&mode_w=off&by=collector&errors=0&site=www.sco.com&site1=&sample=2&subm it=Examine&range=5d&maxy=0

Maybe somebody should tell DiDio, that it can't be a crunchie at fault. It's hard to keep precise time when you're stoned out of your mind at the ashram.

P.S. Do SCO's ISP indemnify them?


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:38 PM EDT
Another Canopy company that trades in Linux:

http://www.center7.com/us/products /vm/

[[Vintela Manager is a secure, web-based, systems management solution that reduces the cost of deploying and managing established versions of Linux and SCO UNIX. ]]

Not anymore, methinks.


John Goodwin

[ Reply to This | # ]

  • radiocomment - Authored by: Anonymous on Wednesday, November 05 2003 @ 07:14 PM EST
radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:40 PM EDT
By the way, I wonder if that MySQL is the one
IBM used to say it owned.
John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:40 PM EDT
When I read your Canadian expert's explanation of data mining, several ideas began to come together. Please bear with me while I go through some rather speculative reasoning.

1.Obsfucated Code

SCO claims that there is lots of obsfucated code. This implies that the search algorithm makes approximate matches.

2.Spectral Analysis

This is the basis for a field called automatic target recognition (ATR). In its simplest form an image of the desired object and the unknown image that may contain the object are transformed using the Fourier transform. The transformed images are multiplied together on a point by point basis and the result is retransformed back to the spatial domain. All of you EEs recognize this as the convolution theorem. Convolution in the spatial domain is equivalent to multiplication in the frequency domain.

Convolution is like doing many correlations between the desired object and the unknown image. When the desired object is aligned with its occurrence in the unknown image we get a bright spot. The brightness of the spot tells us about the degree of match.

I speculate that SCO did something like this to find what they call matching code. This would be done using some kind of spectral transform, perhaps a wavelet transform, to the Linux code (the unknown) and then matching that up with the transform of a piece of SCO code (desired object). One way to do this ould be to use the ASCII value for characters. This gives a one dimensional data set of numbers that can be transformed.

This has got to be an intensive computation and that is why they are not through working on it.

3.Google Search

Google works great if you want to find snuff boxes and you type in "snuff". If you type in "sniff" you may find one. If ATR were looking for snuff and saw sniff you will get a nice hit.

I do not think SCO is using text search methods.

4. Mathematicians

While I am an engineer who does algorithmic development, I have been referred to as a mathematician because some see the development and application of computational methods as "mathematics." (IANAM)

5.Convergence

Enter "automatic target recognition" into Google; the first hit is MIT AI Laboratory. I speculate that someone with past connections with MIT AI Laboratory may be who is referred to as "MIT Mathematicians." MIT Mathematics Department is a different place.

Ron


Ron Michaels

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:42 PM EDT
quatermass wrote: Check comments section on recent SCO stories at www.linux.com, and you'll see this idea is being somewhat discussed... I bet there are more quitely thinking about it, or discussing it on private forums or IRC.

Personally, I have a lot of sympathy with the idea. Why supply free code, free support, and free products to sell, to somebody who is gunning for you?

Thanks for the info- I will check out that discussion. An analogy to this situation would be getting hit by the local bully in the schoolyard. Be holier-than-thou and turn the other cheek and the S.O.B. will probably smack the other cheek for you! I just don't see why programmers should continue to turn out great programs under the GPL and be abused by a greedy commercial entity over their work. Open Source is intellectual property too- just like SCO's Unix code.


wild bill

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:42 PM EDT
Bob, maybe you could get yourself to be a creditor somehow (or maybe shareholder??).

The idea to get in on any possible future bankruptcy hearing.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 01:45 PM EDT
wild bill - most practical approach would be for lots of projects to

simply skip security updates for SCO platform. Then they have to

go fix everything themselves. No need to cripple the platform--

just make it insecure and let others do that for you.


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:04 PM EDT
> Another Canopy company that trades in Linux:

Some or all of Vintela products used to be SCO's Volution products.

SCO sold them to Center 7 in some complicated stock deal... and got the right to continue to sell the products to its customers in a complicated royalty deal.

Check old news stories for how Volution was going to be the next big thing according to SCO - which makes it even more surprising that they gave them away for a relative pittance.

Personally, as SCO wrote them, and still sells them, I think IBM should check if Vintella/Volution infringes any patents.

I think DiDio should ask SCO if Center 7 indemnifies SCO against patent infringement.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:06 PM EDT
> Another Canopy company that trades in Linux:

Some or all of Vintela products used to be SCO's Volution products.

SCO sold them to Center 7 in some complicated stock deal... and got the right to continue to sell the products to its customers in a complicated royalty deal.

Check old news stories for how Volution was going to be the next big thing according to SCO - which makes it even more surprising that they gave them away for a relative pittance.

Personally, as SCO wrote them, and still sells them, I think IBM should check if Vintella/Volution infringes any patents.

I think DiDio should ask SCO if Center 7 indemnifies SCO against patent infringement.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:13 PM EDT
Lesser Known Canopy group specialties:

http://www.clearstonehealth.com/index.php?gettopic=Products Services&getsubtopic=OtherServices

# eLearning in Healthcare

# Bloodborne Pathogens

# Peripheral IV Therapy

# PICC

# Wound Care

# Pain Assessment

# Pain Management

# Needlestick Prevention

# Improving the Sales Process through eLearning


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:18 PM EDT
Yarro is also on the board of Canopy's Altiris who also do some Linux kind of thing (neverly clearly understood what).

Those board meetings must be interesting!


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:26 PM EDT
Why is Darl C. McBride listed as CEO here?

Their webpage says the Chairman is also CEO. Note the deal with IBM for web services too.

htt p://www.asia-links.com/matrix/b2b/b2bcompdetail.asp?companyid=1435


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:28 PM EDT
On the topic of somehow poisoning open-source software for the SCOX platform.

Many actual free software developers (including myself) are angry at SCOX and are definitely thinking and talking about ways to fight back at their aggression. That said, the idea that there are no restrictions on _who_ can use the software is very central to the free software philosophy. In fact, the idea of excluding particular wrong-doers has come up many times in the past, and the consensus has always been that it's important to keep the priniciple. Indeed, any such restriction would fail to meet the standards of the Open Source Definition and the Debian Free Software Guidelines (from which the OSD was derived).

The most common class of use-restricted but "almost free" licenses are those that permit non-commercial use. It used to be fairly common for software to be released to the academic community under such terms, but the practice is fading, supplanted by real open source licenses.

I think there may be other effective strategies that don't raise these kinds of issues. I like the idea of not accepting patches specific to the SCOX platform. This puts the burden for applying such patches and distributing the patched versions firmly on SCOX, which feels just to me. I'm not a big fan of putting in explicit anti-SCOX "logic bombs", because I think that unfairly affects users. On the other hand, I am in favor of adding text to platform-detection messages. I'd go for something like this:

Platform detected: SCO UnixWare/OpenServer/whatever

NOTICE: While you have a legal right to use this software on this platform under the terms of the GNU General Public License, the authors of this software deplore the tactics of the SCO Group, and do not support this use. Patches specific to SCO Group platforms will be rejected. Thus, running on this platform may be less robust than other platforms. Please consider changing to a system less hostile to the interests of the free software community. Thanks, <project name> team.

The cool thing is that it's likely that even if SCOX systematically tried to remove all such notices from the versions they ship, it's likely that some would slip through (they're not that smart, you see). In addition, software compiled from source, or adapted from, say Red Hat binaries (running on the Linux Kernel Personality module) would see such notices unchanged. Real users would probably get a good chance to see these messages frequently.

I'm interested to see what kind of responses develop. This almost certainly won't be the last time a company lashes out against the free software community. It would be good to have a response that is ethical, morally justified, and highly effective.


Raph Levien

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 02:38 PM EDT
Alex, a horizontal-friendly version of this comment thread can be viewed here.
CSS2

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:04 PM EDT
All this talk about "Spectral Analysis" and now the #1 buzzword, "Wavelets".. The, in my largely unprofessional opinon, correct way to compare (parts) of two codebases would be to first tokenize them, including compressing all kind of whitespace to a token WHITESPACE and so on, then build parse-trees (basically it's like a compiler up till this point) and then compare these trees against each other, both topology and content, using some suitable scoring function (there the magic lies).

When scoring you'd give positive points for such things as "same variable name", "same order of non-dominating statements", etc and maybe negative for others. Some idioms might be so common as to be useless in this type of comparison and should be pruned from the tree altogether or not scored, or simply tokenized into some low-score token.

Computing the edit distance for subsets of the parse-tree(s) is probably a useful scoring function in itself. For an introduction, see for instance http://citeseer.nj.nec. com/navarro00indexing.html


eloj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:09 PM EDT
Bob,

I really think that SCO are going to liquidate. What we know is: SCO initiated extremely dubious legal action against IBM; not only is the case extremely flimsy, but exceedingly hard to prove in the circumstances even if they were correct. Furthermore, any such legal action would undoubtably draw massive and ultimately fatal retalitary action, should IBM not settle. Instead, they have mysteriously broadened their claims to all aspects of Linux by making rediculous, even boarderline slanderous, statements -completely unsupported and unproven. Their legal council, especially one as respected as Boies, would demand that such egregious statements cease, since they would seriously undermine their position, and even their case, in a court of law. Conversely, the statements and press releases have increased and show a strong correlation between prices and timing (the "fortune 500 company" statement was made when their shares were down $2 off their opening value, a massive 20% of the total value). Insiders have been insidiously selling their share in the company, making upwards of $50k, sometimes as high as $200k, at a time; in somecases, the shares where sold at $15, a massive 15 times the value of their pre-IBM price. Furthermore, a dubious purchase of a fellow canopy company at inflated value, despite the company's incompatibility with SCO's current product line -lacking in compatibility for that matter. More importantly however, early in this debackle a filing was made to the SEC (it's on the Yahoo board) which states that SCO offers total indemnication for the actions of its executives, whereas previously it hadn't (whether this would be retroactive, I don't know).

With the recent talk of SCO relaunching UnixWare, one has to wonder: if they -by their own tongue- own the rights to Linux, why upgrade an inferior technology, while they could already license Linux to a far greater profit? Even if this was the case, SCO does not have the personnel, having sacked the majority of their R&D staff, to compete which such "legal" UNICES suchas Solaris. None of the pieces fit together.

I would agree that the liquidation of SCO is most likely the plan. Afterall, how would they profit most from this debackle. The question is, what would happen?

Clearly, IBM would not accept such an outcome, and no doubt proceed with the legal action (IANAL) in spite (if possible). If SCO was found guity of gross misrepresentation, would the executive then become liable for their actions? Though this may not be the case, it's a pretty sure bet that a SCO cannot indemnify it's employees actions, if the said actions were illegal. Furthermore, wouldn't such a mysterious liquidation be EXTREMELY illegal; with the eyes of the world on them, and the hearts of thousands of developers against them, they would most likely fail to slide silently out of the limelight.

Or, it may be that Darl McBride _really_ is a dumb as he seems and the true intention was to get bought out by IBM all along!


Stephen Henry

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:22 PM EDT
Speaking as a mathematician & computer scientist (I teach the former, have a PhD in the latter) I'm extremely skeptical about "spectral analysis". Spectral analysis refers to analysis performed in the frequency domain. It's hard to see what sort of frequency domain can apply to code, except for (as pj's expert mentioned) literally the frequencies of fragments. Spectral methods work well in signal processing usually because important signal properties show up in the frequency domain, whereas the time domain is often littered with insignificant garbage. That can't be the case for code - the important thing is the functionality of the code. I would be surprised but not overwhelmed if frequency analysis of programs could determine whether two fragments of code were *written by the same person* (by picking up coding styles well in the frequency domain) but that has no bearing here.

With my computer science hat on, I know very well how hard it is to build a mathematical model of the functionality of a program. Indeed, for C programs it's damn near impossible to do with accuracy.

Spectral analysis seems to me to be a red herring - one of those buzzwords thrown around by people trying to portray themselves as experts. In this case it's being used by SCO to hype the calibre of their "mathematicians", whom they can't name (?!). On the other hand, there is plenty of literature on catching plagiarised code (some universities have automated systems which screen for prima facie cases of copying, later scrutinised by a human) which is not too bad at seeing through obfuscation. If I were one of the Universities or businesses who has a licensed copy of the SysVr4 code, I would be sure to run that against the linux kernel and leak the results.

PS: Wavelets are another red herring. They are just a way of getting into a somewhat different frequency domain and are suitable for signals where low frequency does not demand good location. They can't be relevant to code.

PPS: A very important part of what pj's expert was saying hasn't been properly publicised. A VITAL part of data mining is not only to find matches but to answer the question "how likely was it for this match to occur by chance". It's a difficult question to answer, but without an answer you can't quantify the significance of a match. (With my statistician's hat on I'd point out that "significance" is the technically correct word - the data mining program ought to be able to approximate the significance level for the particular observation, or something like it).


Dr Drake

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:24 PM EDT
Raph Levin -- I don't think a lot of effort should go into poisoning SCO platform--or supporting it anymore. Failure to provide security updates has nice Fear value, because "package foo no longer works on SCO" is one minor problem, but "package foo is insecure on SCO" is a big problem. Simply stopping security updates (why bother?) should be more effective. Also (speaking as a QA guy)--don't worry about poisoning the code, just don't *test* it on that platform, or do bugfixes that are SCO- specific. Trust me, it *will* break one day.

If you must poison SCO, setting the LANG=C variable somewhere in your installation procedure and exporting to the compilation shell should break that install and most downstream ./configure, make, make install's. Lot's of existing software works around the LANG variable for SCO, and will compile wrong if it's set. In this day and age, packages install other packages.... LANG=C should be like m4sugar in the gas tank.


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:26 PM EDT
I would like to hear from a lawyer on the spectral analysis and data mining produced evidence actually being allowed to be introduced in court. There are many ways to check for authorship of text documents that have been found to be very reliable. How about plagarism? What kind of criteria is used for that? As has been pointed out here and elsewhere, pattern matching algorithms are bound to find many proximate matches in a structured language such as C and C++. And when a program or algoritm is written to a published specification, then the pattern matches are bound to show up even more strikingly, such as Jay Schulist's clean room implementation of the Berkeley Packet Filter. Ther are also bound to be pattern matches in totally unrelated areas. I think that SCO has another huge hurdle to jump here just getting this type of stuff admitted as evidence.

Glenn


Glenn Thigpen

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:32 PM EDT
Hey what's up with our favortite FUD generator? I wonder if the legal team has muzzled them. Perhaps whilst the stock is >14 they don't have anything to "announce".

Morbo


Morbo

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:34 PM EDT
SCO busted tracking SCOX Yahoo! messages

Yahoo! Message


MajorLeePissed

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:39 PM EDT
http://www.macobse rver.com/article/2003/08/29.11.shtml

Tinfoil hat theory. Not necessarily my opinion, but an opinion, pure conjecture. If you think this is a silly idea - tell me why - please.

It started off about some libraries. Check the early SCOsource stuff. This is probably where they spent the legal-advice money initially - and they haven't spent much since.

Nobody was interested.

So in other words, they (SCO) blew it.

Boies and Heise don't know the difference between shared libraries, UNIX, Linux, JFS, RCU, etc. They certainly don't know the meaning of whatever it was SCO registered the copyrights for, or UNIX history. They don't know the meaning of those slides, and they are trivial pieces of code in any case.

Their client (SCO) tells them IBM and others are ripping off these libraries and putting them in Linux. Their client (SCO) tells them this is how Linux got JFS, RCU, NUMA, etc.,

SCO's complaint is a basically a recitation of the "facts" according to Sontag and McBride. Boies and Heise don't know what they're writing. This would explain the curious section about shared libraries even in the revised complaint, which seems to have little or nothing to do with the rest of the complaint.

Heise maybe even thinks the shared libraries are what they registered copyright for. If these are being ripped off, he thinks he can sue.

Check out Heise and Boies comments - do they ever say anything about SCO's general "case" against Linux??

Boies & Heise probably don't do email or web browser. They don't read computer magazines. They are probably unaware of much of the press SCO is getting. If they read the computer magazines, they might not understand the story being told in any case.

The Linux IP license is probably McBride and Sontag's own work. Input from legal counsel is minimal.

The 3 teams reviewing Linux code probably don't exist. After all the other 3 teams supposedly finding Linux customers to sue, can't exist, if SCO's statement in The Age/SMH is true.

In summary, it started as a legitimate, or semi-legit, attempt to extract revenue from some libraries SCO wrote. It didn't work. At that point, they went on a different course, Boies & Heise never realized (perhaps until they read the Red Hat complaint) that SCO was pursuing in the press an entirely different set of issues to the shared library thing.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:49 PM EDT
quartermass: > Personally, I have a lot of sympathy with the idea. Why supply free code, free support, and free products to sell, to > somebody who is gunning for you?

Standard IANAL bit, first off, I endear the same sympathy as do you, with regards to Linux SCO support; however, it's a petty thing to do and might even be illegal. Despite the code is free, as it becomes a foundation that businesses are built upon, I think it would be illegal to intentionally discriminate against SCO.

I don't agree with any of SCO's actions, in the least. But I am not the American legal system and it is within SCO's rights to defend its contracts and IP. Of course, they'll have to prove their claims, otherwise face stiff penalties, but it's their right to do so, as a business in the United States. I think petty recriminations, by any software developer, is sad to say the least. Even the SAMBA folks realize this, they provide software under the GPL, for end users, regardless of their actions. If end users want SCO support, SCO support will exist. If end users ignore SCO, then the SCO code will become legacy and eventually be removed. Just imagine the SAMBA folks getting pissed at Microsoft and altering their source to make it incompatible with Windows just because they don't like Microsoft's business ethics. Sounds silly if you ask me.

This is the power of America, we as end users can speak a language that businesses, for profit and not for profit alike, can understand, and that's demand. All businesses want to at least survive, and you can't survive if noone wants your product. The developers of SAMBA don't release SAMBA just for fun, they ultimately do it for us. If I wanted software that was periodically rewritten to break compatibility with another software package, then I'd install some version of Windows on our company's domain controller, web server, 2 database servers, 2 routers, etc...

The OSI is about more than just writing cool software, it's about the ideals that ESR layed out for us all. He wanted the DDoS to stop, because it was a childish manuever that accomplished absolutely nothing. If someone calls you a bad name, does that give you the right to smash the windshield on their car? It doesn't, and that's where the legal system comes to play. You can take the issue to court and have it resolved in front of a judge and/or jury. Just imagine if all of the energy put into DDoS attacks and source manipulation, went into creating a coherent response to SCO's actions. I'd take the coherent response over a DDoS and source manipulation any day.


Tazer

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:52 PM EDT
Some good keywords to might be "Program Differencing" and "Tree Differencing". I found a TR on the latter here: http://www.quci s.queensu.ca/TechReports/Reports/95-372.ps

It's about structured documents, but the same principles can be applied to parse trees.


eloj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 03:57 PM EDT
Plz forgive for being off topic but, Bob Toxen writing for Net-security.org wrote an
article entitled SCO v. IBM
in which he reassures his readers that no Linux user has anything to worry when using Linux.
If someone else linked to this article earlier, my apologies, I shoulda read all the posts first.
PhilTR

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 04:15 PM EDT
> Standard IANAL bit, first off, I endear the same sympathy as do you, with regards to Linux SCO support; however, it's a petty thing to do and might even be illegal

IANAL, but I guess it depends on what they do.

But I'm curious what you think might be illegal?

Most GPL software authors don't provide versions for Windows/Mac/Plan9/BeOS/QNX/zillion-obscure-UNIX versions

1. Are they under any obligation to do so?

2. Are they required to not depend on functionality that happens not to be present in whatever operating systems?

3. Are they required to provide workrounds for bugs that appear on certain platforms?

IANAL, but I am not aware of any legal reason why they would be required to do any of the above. If you are, I'd be interested to know.

BTW in case you are thinking of MS-DOS vs DR-DOS under Windows 3.1. I don't think this is quite the same thing, as in that case we were talking about anti-trust issues. Which presumably don't apply to a typical GPL package.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 04:19 PM EDT
PJ: If your out there, these two links tell just about all there is to tell in the 10th Circuit concerning program infringement.

http://www.digi tal-law-online.com/lpdi1.0/treatise22.html

http://digital-law-online .info/misc/ogilvie.htm


gumout

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 04:33 PM EDT
> I really think that SCO are going to liquidate.

I once had the, umm, pleasure of being an observer during the last days of a company that was being strip-mined by its owner, who was a professional at it. On the day they filed Chapter 11 bankruptcy, the creditors got a radioactive husk that had nothing left. (In this case the 'radioactivity' was an EPA Superfund lawsuit involving the remains of decades-old metalworking facilities). They were $40 million into the banks, losing $25 million a month, with sales dropping like a rock. The creditors took that over and the previous owner walked.

What were some of the final steps, so that we might watch for them in the SCO case? Getting the owner's people out. They brought in a board member as a new Chairman/CEO, paying the previous occupant a hefty separation package. Next to go was the CFO... he too was "resigned" but given a generous package on his way out. Then the president... ostensibly "let go" but he in fact walked out with about a half-million of the final remaining cash as his "separation agreement." About two weeks later they cratered the thing. All the "resigned" officers surfaced later in exec positions in other companies owned by the same guys. It was all a game... they extracted maximum cash from the public, the suppliers, and they even burned the banks (big ones... Chemical took about a $30 million write-down over this deal).

If Darl and his friends start disappearing, either not replaced at all or replaced by gullible underlings who don't realize what is being done to them, we can assume the end is near. If there are more quarterly payments coming in from MSFT or Sun, they will probably wait for those to arrive, and pass the cash out to other Canopy properties as fast as they can. When they're down to the last million or so, they'll pass that out as separation money for departing officers... and then crater it, leaving IBM and Red Hat with no one and nothing to sue.

A good question for the legal brains among us is whether Canopy can in fact rely on a bankruptcy court to award it the UNIX IP (Canopy will be the only serious creditor at death), or can IBM and Red Hat somehow pursue their lawsuits and perhaps acquire the IP from the bankruptcy court as their damage award?


Bob

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:09 PM EDT
"How about plagarism? What kind of criteria is used for that? "

It's whatever you can convince a jury it is. Generally you look for intact or lightly edited passages, sudden changes of style, and vocabulary differences. Usually you know what the probably source is, so you do a comparison of the texts, and consider whether the author of "B" was in a positon to steal from "A". Where it gets tricky is when both authors are using a common source ... you have to decide how much of the similarity is because of the ancestral texts and how much was lifted from "A". That same problem holds true in technical works: there are only a few ways that a USB port can be described, and if both authors were working from the published specification, and both are skilled writers, the text is going to be very similar because of the constraints of the subject mattre, and the language, and the expectations of the readers.


Tsu Dho Nimh

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:11 PM EDT
Quatermass,

re: Tinfoil hat theory

I dunno... that theory ascribes an incredible amount of stupidity to Boies & company.

If I were a lawyer, I don't think I'd take the word of two executive types on either software or IP issues; I'd get the opinion of an independent expert.

On the other hand, even though Boies has a reputation to protect (such that it is), he's been strangely silent, letting his sidekick Heise make such legendary statements as "... copyright law allows only one copy to be made ...". With nitwits like McBride and Heise on his side, Boies probably decided to retire to Aruba.


Dick Gingras - SCO caro mortuum erit!

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:12 PM EDT
Note "Altiris" logo on building in this picture...

http://www.smilereminder.com/inde x.html


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:14 PM EDT
Sorry. You have to click on "About Us" to see the picture.
John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:26 PM EDT
willows.com used to be Canopy. Are they still around?

"Software Tools and Services Enabling your Windows® Applications to Run on UNIX®, Macintosh® and Other Systems."

http://216.239.37.104/search?q=cache:waaU S0kVAh4J:www.willows.com/+%22willows+software%22+canopy&hl=en&ie=UTF-8


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:31 PM EDT
Name: willows.com Address: 216.250.129.62

Name: sco.com Address: 216.250.140.112


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:49 PM EDT
John,

You wrote" Why is Darl C. McBride listed as CEO here?"

PointSource, was D. McB's employer before he started working
at Caldera last year.

The page you found seems to be a tad out of date, like SCOG's unices...


D.

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:50 PM EDT
Noorda Family Trust (redirects to Canopy)

Name: nft.com Address: 216.250.129.2


John Goodwin

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 05:56 PM EDT
> I dunno... that theory ascribes an incredible amount of stupidity to Boies & company.

Like I said it's just an opinion, 100% pure conjecture, and not necessarily even my opinion.

I don't think it ascribes stupidity, just technical ignorance. You know, even some people who might be more technically astute than Boies got fooled by the slide show. Furthermore even if Boies eventually caught on, could he back out at that point?

The easiest way to show it's a totally bogus theory, would be to find any specific reference from Boies or Heise to the general Linux case - or - to find a plausible logical explanation of why there's a big section about shared libraries in both the original SCO complaint and the amended complaint.

Seeing as we are in tin-foil hat territory today, there is one other theory that is worth mentioned. Again 100% pure opinion and conjecture, and not necessarily even my opinion. I do not believe this theory incompatible with the bankruptcy or Boies/Heise=dupes theory

Tinfoil Hat Theory 3:

SCOX = BRE-X

BRE-X if you remember was a struggling small town Canadian mining company.

Midland Walsh, one of the principals (founder?), was famous for suing a former employer and getting a settlement for an undisclosed sum.

BRE-X suddenly said they found these incredibly huge gold deposits in a mine in Indonesia.

BRE-X said they had their own secret teams of experts, whose identities they couldn't reveal, supporting their claims (assaying of core samples for gold).

Industry experts criticized the techniques for assaying which were unorthodox, didn't follow industry standard practises.

The company's reports (with incredible claims) were criticized by industry experts for the same reasons. The industry experts were ignored.

Despite this media and stock analysts preferred the company's version to that of the industry experts. Some analysts really pushed the stock hard.

As more and more discrepencies in the companies story came to light, the company produced a series of increasingly unsatisfactory explanations, which were debunked by industry experts too.

The stock prise rose and rose on the Toronto Stock Exchange. Massive relatively uncritical media coverage.

Insiders cashed out millions of stock. I think it was a tiny fraction of the total company, but still a lot of money to them.

Eventually it turned out the samples from the mine had been faked. All was revealed. The stock price crashed so badly in a single day that it broke the software for the Toronto Stock Exchange.

Links to BRE-X story:

Short summary: http://www.goodreports.net/bregoo.h tm

Long version of story: http://www.sbae r.uca.edu/Research/1999/SRIBR/99sri091.htm

The tech stuff: http://minera ls.state.nv.us/programs/min_fraudami.htm#bre-x

Could this tin-foil hat theory be true?

For #1: Lots of people report difficulty (impossiblity) of buying SCO Linux IP licenses. They don't seem to be actively trying to actually sell their new product - or actively pursue their riches by litigation strategy.

For #2: So many secrets - the code gold, the code analysts, the Linux IP customer, etc

How to disprove: SCO or some enterprising report to find and properly verify any of the SCO secrets


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 06:00 PM EDT
Just a comment to "Tazer"

Not sure where you are comming from with the "might even be illegal..", if such were actually the case, M$S would have had their pants sued off over not supplying Office for all the other platforms, discontiniung support for software & such. As for Samba, the only reason such exsits is so that Linux can talk to/be used instead of M$S Servers. Dropping "support" for M$S in this case would pretty much mean there was no Samba. As far as the Linux world, or for that matter all other operating systems I've ever worked on (starting with a Wang mainframe in grade school), there are much better ways for network shares and such to be done (an example would be NFS). The SMB block is not actually very good, but it's what M$S decided to use, so everyone else had to try to figure out how to supply compatability it - no easy task as M$S keeps modifying how it works (they extend & exstinguish their own stuff too - the "forced" upgrade...).

While I would agree that the programers perhaps "ought" (in an ethical sense) to leave the previouse work in support of SCO's stuff in, continuing to fix & update such is certaintly not "required" of them in any sense - legal, ethical or what have you. Changing a line in the compile instructions that ignores issues with compilling on SCO's stuff is, IMO, also not an issue - it is a "no longer supported platfrom", something everyone in the computer world has, for lots of reasons, gotten somewhat used to.

Thomas


Thomas LePage

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 06:00 PM EDT
Just a comment to "Tazer"

Not sure where you are comming from with the "might even be illegal..", if such were actually the case, M$S would have had their pants sued off over not supplying Office for all the other platforms, discontiniung support for software & such. As for Samba, the only reason such exsits is so that Linux can talk to/be used instead of M$S Servers. Dropping "support" for M$S in this case would pretty much mean there was no Samba. As far as the Linux world, or for that matter all other operating systems I've ever worked on (starting with a Wang mainframe in grade school), there are much better ways for network shares and such to be done (an example would be NFS). The SMB block is not actually very good, but it's what M$S decided to use, so everyone else had to try to figure out how to supply compatability it - no easy task as M$S keeps modifying how it works (they extend & exstinguish their own stuff too - the "forced" upgrade...).

While I would agree that the programers perhaps "ought" (in an ethical sense) to leave the previouse work in support of SCO's stuff in, continuing to fix & update such is certaintly not "required" of them in any sense - legal, ethical or what have you. Changing a line in the compile instructions that ignores issues with compilling on SCO's stuff is, IMO, also not an issue - it is a "no longer supported platfrom", something everyone in the computer world has, for lots of reasons, gotten somewhat used to.

Thomas


Thomas LePage

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 06:38 PM EDT
quartermass, I think I am agreebly naive on legal matters, but let me elaborate. Please correct me where I'm wrong. I would think that any organization, for profit or not for profit, would have to distribute it's products and/or services in a non-discriminatory manner. Otherwise, if an entire business was based upon Linux, not that SCO is, then it would have to pander to the developers of the Linux kernel, otherwise face possible retaliatory measures which is similar to extortion<?>.

Ultimately, some person or organization has to be accountable for Linux(http://www.osdl.org/about_osdl/): "OSDL Mission To be the recognized center of gravity for Linux; the central body dedicated to accelerating the use of Linux for enterprise computing through: Enterprise-class testing and other technical support for the Linux development community. Marshalling of Linux-industry resources to focus investment on areas of greatest need thereby eliminating inhibitors to growth. Practical guidance to our members - vendors and end users alike - on working effectively with the Linux development community."

I am not clear on laws regarding non-profit organizations, but by allowing blatant code manipulation against a specific company, wouldn't OSDL and/or Linus Torvalds be held to the same ethical business standards that other companies are? Wouldn't SCO be able to make some sort of legal argument that since OSDL is a non-profit(charitable) organization, that by excluding a certain group, intentionally, that they should lose their tax-exempt status? Wouldn't they be creating an unfair advantage?

For instance, Oracle has spent a probably large sum of money on their Linux port. If OSDL decided that they didn't like Oracle anymore and intentionally created, or permitted to be included into the kernel, incompatibilities that prevented Oracle from running on Linux, couldn't Oracle do anything about that? If this is the case, I can't fathom why any business would build products for Linux, especially if it's not possible to enforce certain restraints.

Sure, they could start their own distribution based on the last compatible version of Linux, but that would require a significant investment and diversion from their current market strategy. Every software manufacturer would effectively have to be prepared to be an operating system manufacturer as well.

These types of legal manuevers may not be a huge blow to Linux, but if I'm a beginner on law, and these are valid arguments, it would be reasonable that a well equipped law firm could find many more issues than I.

Don't get me wrong, I'm a huge Linux advocate and have been know to Microsoft bash on occasion(daily at 4:30PM in the IT managers office), but do any of my questions or points have any merit?


Tazer

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 06:49 PM EDT
Assuming that SCO were liquidated, and assuming that Canopy as the big creditor were to receive the UNIX SystemV rights (such as they are), I should think that Canopy would not be out of the woods as they would then have property which IBM has already indicated infringes their patents. What would prevent IBM from then persuing Canopy over those same patents? The transfer of the rights could quite possibly involve the recipient(s) in court, fighting the biggest patent team in the USA. It would be entirely consistent for IBM to let the really big dogs loose on Canopy, since Canopy is financially benefitting from the SCO price run-up due to the trial-by-press-release.

It is much like the scene from "The Wizard of Oz," you cannot ignore the man behind the curtain, even when he demands you do so.

In a way, this whole episode would seem to make toxic waste out of the SystemV rights.

Marty


Marty

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 06:50 PM EDT
Thomas, I guess I mean there is a significant difference in eliminating compatibility and introducing compatibility. Granted, OSDL is a non-profit organization, but as the *nix industry moves forward, I would expect to see Linux proliferate the market, essentially becoming the dominant standard platform on which applications are built. Maybe I'm saying that Linux might become a monopoly, and will have to follow a specific, legal, business ethic(antitrust?). Isn't there a precedence that would force the compatibility to be kept in? Maybe like a public service?

I truly am not a lawyer and am posting so that I might learn the answers to some of these questions I have.


Tazer

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 06:57 PM EDT
Tazer, first off, I think we're talking about applications rather than operating systems.

Second, I think there's a big difference between sabotage, and simply not supporting something.

I am not aware of any legal reason that obligates any developer (except maybe Microsoft who are bound by anti-trust issues, and even then there are limits) to support everything.

And as a practical matter, they simply couldn't, even if they wanted to. Time and cost, limit all software development. What about QA - is it practical to test on a million platforms?

If SCO's UNIX platform has bugs or pecularities in it, that happens to mean some new piece of software doesn't work on it - that's SCO's problem, not the developers.

To give an analogy... there are bugs and limitations in Windows 95 that a developer can work round. There are also bugs and limitations which are not easily worked around. Similarly there are bugs/limitations in say WINE.

If a developer (say Adobe or Macromedia or Westwood or whoever) produces a new or updated application, are you really saying that they are legally obligated to ensure that their software works on all of Windows 95, 98, NT4, 2000, Me, XP, 2003, WINE, etc. Mac too? QNX too? Plan9 too? HPUX too? And so on?

As far as I am aware, they are only obligated to support those platforms that they want to?

Can I successfully sue AOL, because their latest clients are no longer compatible with the Commodore 64 or Apple ][, even though years ago they used to offer that?

If you really think so, please point to a law allowing me to do so, and a case where such a suit succeeded.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:03 PM EDT
Tazer:

One thing about Open source is that anything you add to introduce incompatibility can be removed, often with less effort than it took to add it.

On the other hand, just neglecting platform-specific updates and bug-fixes puts a significant burden on the people who use and maintain software on that platform. That sounds reasonable and legal to me. IANAL.


r.a.

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:12 PM EDT
http://uptime.netc raft.com/perf/graph?site=www.sco.com

SCO is down again. Taking a web site doesn't sound like a usual ddos and it also doesn't sound like usual server maintenance/upgrade.

Anyone with an idea what's going on? Has anyone ever seen anything like this?


r.a.

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:16 PM EDT
r.a. I have no explanation

Peter Williams of VNUNET seems to think it's a DOS

http://www.vnunet.com/News/1143283

Maybe somebody ought to tell him that the downtime is regular, real regular, like a clock.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:18 PM EDT
It seems to me that someone with access to the relevant sources should be able to tell if there has been major pilfering from Unix System V into Linux by:

1) use some kind of pattern matching to find suspicious lines (should be trigger happy, false positives OK, false negatives not OK. This is the `suspicious population'

2) take a random sample of the suspicious lines, perhaps 50 in the first instance, and do the code philology on them carefully to work what percentage of them were illegitimately copied.

3) you can then use basic stats to estimate what percentage of lines in the suspicious population were illegitemately copied. The accuracy of the estimate depends on the size of the sample and percentage of lines found to be copied, *not* on the size of the original population, and is more accurate the closer the percentage is to 50%. So if half of the sample come up bodgy, then it would be clear that there is a tremendous problem, but only one or two did, then you'd need to extend the sample to get a respectable estimate.

Well the guys at IBM and OSF know more about this kind of thing than I do so maybe they're doing like this, or better.


Avery

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:24 PM EDT
http://www.connect-utah.com /article.asp?r=139

"If IBM drags the case out into several years, we will consider seeking damages from Linux customers," says McBride.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:53 PM EDT
http://www.connect-utah.com /article.asp?r=139 "If IBM drags the case out into several years, we will consider seeking damages from Linux customers," says McBride.

Unfortunately, there is no law that says they can seek damages in a trade secret leak from anyone except the leak source. And copyriught law is the same way - the infringer, the perswon who actually knowingly ripped off the code is the only person who they can take action against.

Suing users isn't mentioned in the remedies section.


Tsu Dho Nimh

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 07:58 PM EDT
Tsu, my post pertains to the SCO not suing anybody & never had any plans too, as a possible SCO defense to Red Hat's suit, as discussed in the last comments section of last article (at least in the bits I can see)
quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 08:02 PM EDT
Dr Drake, gumout, and Glenn: I think Computer Associates v Altai is spot on, but for all the wrong reasons. It was both a

copyright case and a trade secret case. So far SCO has failed to make any copyright claims. Gumout is correct to mention it

though, it's full of good stuff. It was cited by BSDI and Berkeley in the USL v BSDI case.

ht tp://www.kentlaw.edu/e-Ukraine/copyright/cases/computer_v_altai.html I'll refer to Computer Associates v Altai as "CA" from here

on out.

Before you even begin to look for "substantial similarites" you first have to prove that the author had access to the Unix

System V source code. As I understood the press reports, the author of the Berkeley Packet Filter derivative shown in the Las

Vegas slide show was not a SCO licensee. If you can't prove that an author had access, you don't bother with any analysis

That should hold true for any of the other Linux kernel copyright holders.

On appeal,there is some discussion in CA about how the lower court should have first excluded the functional code required by

external specifications (POSIX, iBCS2, & etc.) and also any code taken from the public domain from the similarities test.i.e.

abstraction, filtration, comparison. It's important to remember that this high tech search method is just an alternative

way of raising the question of substantial similarity for the judge or jury. "Since the test for illicit copying is based upon

the response of ordinary lay observers, expert testimony is thus "irrelevant" and not permitted. Id. at 468, 473. We have sub-

sequently described this method of inquiry as "merely an alternative way of formulating the issue of substantial similarity." Ideal Toy Corp. v. Fab-Lu Ltd. (Inc.), 360 F.2d 1021, 1023 n. 2 (2d Cir.1966)." That's a citation from CA.

There is another major difference. SCO readily admits that IBM and Sequent own the copyrights and patents. They simply claim

that they control their release or distribution. This theory looks doomed on two accounts. AT&T sent letters to all of it's

licensees explaining almost the opposite. There were even sworn depositions to that effect in USL v BSDI. CA has a nice

discussion of Title 17 section 301 (copyright) pre-empting state trade secret laws. In CA the court simply wanted to prevent

"double dipping", i.e. charging copyright infringment and trade secret misappropriation for the same act. That complicated

things for Computer Associates, but probably makes quite a few things very uncomplicated for IBM and Sequent - they

are being charged for distributing their own code in accordance with their rights under a statute (17 USC section 301) that

pre-empts Utah's trade secret laws..;-)

It's also a fact that AT&T's lawyers felt that copyright is or was incompatible with trade secret protection. That is one of the

reasons I've been given by insiders to explain why AT&T started removing copyright notices from their Unix 32V code. It's also

why they sent out the letters disowning any derivative that didn't contain at least some of their code. Under the procedures

of abstraction, filtration, and comparison outlined in CA, it's possible that you might have to remove any overlapping BSD or

32V code before doing the analysis.


Harlan

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 08:17 PM EDT
quartermass

Well, in retrospect, it seems you're more specifically referring to application maintainers rather than kernel/OS maintainers. That was my misread. I agree that it's not feasible to maintain an application on every platform/OS configuration and it's up to a specific company/organization to determine which platforms it will decide to support. My argument was primarily focused on the actual kernel itself and is therefore deprecated.

I guess misreads are the price you pay for setting up Gentoo VPN's all night. =) Seriously, I was trying to argue a good argument, not pick a fight.


Tazer

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 08:17 PM EDT
gumout,

Those links you provided are an excellent source of the current legal thinking on copyright infringement as applied to programs.

Should be required reading for everyone here!


Dick Gingras - SCO caro mortuum erit!

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 08:20 PM EDT
I'm out here, gumout. Take a look in SCO Archives for an article on How the 10th Circuit Defines Derivative Code and for the articles on copyright, that I think are also on the Legal Links page. In the first, the 10th circuit article, there is a link to a paper Dan Ravicher wrote that you will likely get a lot out of. There's another on Patents and Copyright, showing what lawyers and cases have said are the differences.

CSS2, you are too much! Thanks.

quatermass, I agree with you about Boies et al. Its not unusual for an attorney to just go with what the client tells him in the beginning.


pj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 08:55 PM EDT
pj, I'm not sure whether I agree with me about Boies.

However it does seem plausible (which ain't necessary the same as true), that he doesn't or didn't know the whole SCO story.

Another reason, I think this plausible, is the number of simple factual errors, omissions, and lack of precision in the complaint. If Boies knew the whole story, I thought that he would:

1. have described UNIX differently, right at the start. And justified SCO's complaint in terms of SCO's rights to a particular (the AT&T original) implementation of UNIX.

2. and also mentioned the difference between Old SCO and New SCO - and then given a legal justification of why he thinks New SCO has the successor interest in project Monterey.

Before getting behind any theory, should be looking for testable predictions, the scientific method if you like. This has not yet been applied to my knowledge. I do not believe any of the theories need be mutually exclusive.

Anyway, both the theories I posted, are just theories, conjecture and possible opinions. I know they have been discussed elsewhere in other forums. And yes they are pretty extreme, some people might even say wacko.

Currently, I do not currently endorse any of these theories.


quatermass - SCO delenda est

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 09:12 PM EDT
q'mass,

Bingo! I'll buy SCO == BRE-X, a classic pump 'n' dump scheme; the parallels with SCO are notable.

The longer article's concluding sentence is apt: "Bre-X is a story about pure human greed.".


Dick Gingras - SCO caro mortuum erit!

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 09:16 PM EDT
"SCO is down again. Taking a web site doesn't sound like a usual ddos and it also doesn't sound like usual server maintenance/upgrade.

Anyone with an idea what's going on? Has anyone ever seen anything like this? "

Actually, yes. It's an exceedingly funny case of basic stupidity. This business (which shall remain nameless) had a server in their store. Unfortunately for the suits, some idiot plugged the server into a switched outlet. When the last person went home and switched off the lights as they left, the server was suddenly without power. When the first person arrived in the morning and turned on the lights, the server booted up and was running just peachy by the time the first tech-head arrived. No problem with our server... must be someone else's fault! :)


J.F.

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Friday, August 29 2003 @ 10:31 PM EDT
If SCO really has code by line number that is copied, they can see who copied what. SCO has to know names from the kernel logs and changes(if they really have any code) SCO is protecting the very people they say "ripped off their IP". Can Redhat file a claim under the DMCA, for SCO to be forced to reveal the names of the "code stealers". I do not believe SCO has any real code they could show, if the court case was today. Just my .02 worth, Am I dead wrong?

Satan Claims Opensource


nm

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 01:11 AM EDT
I repeat my suggestion that the terms of the GPL be changed so that if a GPL
licensee delibrately violates the GPL on one product, then he barred from using
any GPL product. IMO, this change does not conflict against the
non-discrimination clause of the GPL. Frankly, I don't care about cosmetic
concerns such as being seen as better than the other guy or worry about sinking
to his level when he needs to be taken out. On the other hand, this is what I
care about: I believe that OSS has an obligation to protect the IP of the
thousands of developers who contribute their time and effort, and I believe that
the change I am proposing is a necessary step toward meeting that obligation. style="height: 2px; width: 20%; margin-left: 0px; margin-right:
auto;">blacklight

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 01:22 AM EDT
nm:

This isn't really about code and barely about the court case. The code in question has always been out in the open and if at any time in the past, present or future, any party feels code has been donated improperly they have and will be able to show that they are the true owners and the code will be removed immediately. (Try doing that with Microsoft)

SCO is taking advantage of the fact that as a "traditional" software company, they get a presumption of reasonableness from right now when they make accusations against a decentralized community of programmers. Every week they lose a little more credibility. There are many examples already of mainstream press that doesn't take them seriously. A month ago that was not true.

Redhat, IBM and others have *very* good legal talent trying to resolve this issue as quickly as possible. It is frustrating that SCO gets to keep talking. In Germany, companies are not allowed to make accusations and SCO will be fined if they make this kind of statement there. In the US, it seems companies get more lattitude.

SCO has been fully aware of IBM, Sequent and others contributions to Linux from the beginning. They've seen the code contributions added to them and distributed them. Their accusations today are just bizarre. They probably have succeeded in somewhat slowing the acceptance of Linux but as they lose their presumption of reasonableness they matter less and less.

Here is a link from Slashdot in June 2002 where some IBM developers talk about the legal requirements for adding code to Linux.

"As Linux developers inside IBM, do you get to see the AIX source code? If you do, are you allowed to "steal" some ideas from AIX and implement them in Linux? If not, why not, and what's the IBM official line?

"IBM Kernel Hackers:

"First of all, before any of us were allowed to contribute to Linux, we were required to take an "Open Source Developers" class. This class gives us the guidelines we need to participate effectively in the open source community - both IBM guidelines and lessons learned about open source from others in IBM.

"We are definitely not allowed to cut and paste proprietary code into any open source projects (or vice versa!). There is an IBM committee who can and do approve the release of IBM proprietary or patented technology, like RCU.

"That covers "stealing" code, but what about ideas? We might talk to an AIX programmer and comment we're seeing performance issues in Linux in this area or that area and she tells us they discovered that they really needed to profile the network routines when they saw that. Having solved the problem once, our non-Linux peers can help steer us without spelling it out for us, allowing us to still develop solutions that can then be open sourced.

"It's a fine line to walk, especially as an engineer who just wants the answer :) "


r.a.

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 01:27 AM EDT
J.F, I once came across a server the was going down every night. It was placed in a cellar, and what had happened was that someone had connected it to a shared outlet that had a breaker on a timer that for security reasons switched off every night.
eloj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 01:42 AM EDT
http://www.interesting-people.org/archives/interesting-people/200308/m sg00243.html

Interesting. A flaw in a router caused what looked like an attack but wasn't.


pj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 06:34 AM EDT
pj, in my opinion it was an attack, caused by an incompetent programmer without(?) malicious intent.
The full story is at http://www.cs.wisc.edu/~plon ka/netgear-sntp/

It is impolite to query an NTP server more than once every minute. What has happened to the Wisconsing NTP server is a kind of SlashDot effect, magnified by a bug in the software that did retransmissions every second when a request failed. Multiply that by 200.000 (domestic) routers sold and you get a DDoS.


MathFox

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 08:17 AM EDT
Interview with McBride in Wired: "Are you afraid of being remembered as the man who killed open source? -- People ask why we haven't sued Red Hat. We haven't sued Red Hat because then the GPL [general public license] grinds to a screeching halt, and all shipping distributions of Linux must stop. This whole process is going to make Linux and open source stronger with respect to intellectual property. Today, there's no vetting process to make sure the code that goes into open source is clear. We're trying to work through issues in such a way that we get justice without putting a hole in the head of the penguin." http://www.wired. com/wired/archive/11.09/view.html?pg=3
pj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 08:51 AM EDT
MathFox, I know you know a great deal more than I do on this subject, but when I went to read the article you linked to, I find this:

"Currently, based on our analysis we believe that the NETGEAR "Platinum" products such as the RP614 and MR814 are the primary source of this flood of traffic. They likely will need to have their code changed to mitigate what is essentially an accidental Denial-of-Service flood against our NTP infrastructure. "

There is also a link there to another instance in Australia of a misconfigured router causing problems.

So, what am I missing?


pj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 09:06 AM EDT
You're missing that we call it an attack even though it wasn't done intentionally. This is typical in security lingo. Compare with the use of "break" in cryptography.
eloj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Saturday, August 30 2003 @ 06:47 PM EDT
Thank you, eloj. That makes sense in your lingo. In mine, it more would mean
somebody broke the law, a big difference. Thanks for the explanation. style="height: 2px; width: 20%; margin-left: 0px; margin-right: auto;">pj

[ Reply to This | # ]

radiocomment
Authored by: Anonymous on Monday, September 01 2003 @ 03:17 AM EDT
As I understand it, "spectral analysis" as applied to texts is a technique used to establish the likelihood of authorship. i.e. given documents A,B and C known to be written by X (and some control documents known not to be written by X) what is the chance that a disputed document D was written by X? It's used, for example, to detect material inserted into a statement/confession by someone other than the author.

The significance is that it generates "hits" based on likely common authorship. However, even discounting the fact that two writers of C code can very easily have near-identical styles (much more so than prose), showing that two documents A (owned by X) and B (a disputed document) are likely to have the same author does not show that B infringes on A's copyright. It might show that there is possibly an older version of B which was indeed written by X, but without proof of this older version's provenance there is no case for copyright infringement.

An analogy would be if I wrote a book which (without actually using the same story and characters) deliberately aped Salman Rushdie's prose style, but published under my own name (i.e. not an attempt to fraudulently pass off a new Rushdie work.) Even if I did such a good job that a casual reader who didn't look at the book jacket thought it was by Rushdie, I would not be infringing on his copyright.

However (and this is a nice point) a book reviewer would probably say (rightly) that my book had "ripped off" his style and was very "derivative". In a similar fashion, SCO's press statements talk of "derivative code" in this loose everyday fashion, hoping that it will be confused with its legal meaning in the context of copyright. (Their use of "IP" as if this had a distinct legal meaning is the same gambit.)


Dr Stupid

[ Reply to This | # ]

Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )